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ABSTRACT 



The Hierarchical Memory Model (HMM) of computation is similar to the standard 
Random Access Machine (RAM) model except that the HMM has a non-uniform memory 
organized in a hierarchy of levels numbered 1 through h. The cost of accessing a memory 
location increases with the level number, and accesses to memory locations belonging to 
the same level cost the same. Formally, the cost of a single access to the memory location 
at address a is given by ^{a), where /i : N ^ N is the memory cost function, and the h 
distinct values of /i model the different levels of the memory hierarchy. 

We study the problem of constructing and storing a binary search tree (EST) of min- 
imum cost, over a set of keys, with probabilities for successful and unsuccessful searches, 
on the HMM with an arbitrary number of memory levels, and for the special case h = 2. 

While the problem of constructing optimum binary search trees has been well studied 
for the standard RAM model, the additional parameter n for the HMM increases the 
combinatorial complexity of the problem. We present two dynamic programming algo- 
rithms to construct optimum BSTs bottom-up. These algorithms run efficiently under 
some natural assumptions about the memory hierarchy. We also give an efficient algo- 
rithm to construct a EST that is close to optimum, by modifying a well-known linear-time 
approximation algorithm for the RAM model. We conjecture that the problem of con- 
structing an optimum EST for the HMM with an arbitrary memory cost function /i is 
NP-complete. 
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To my father 
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"Results? Why, man, I have gotten lots of results! If I find 10,000 ways 

something won't work, I haven't failed." 

— Thomas Alva Edison. (I www . thomasedison . comj) 
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CHAPTER 1 

Introduction 



1.1 What is a binary search tree? 

For a set of n distinct keys Xi, X2, . . ., x„ from a totally ordered universe (xi -< X2 -< 
. . . -< Xn)-, a binary search tree (BST) T is an ordered, rooted binary tree with n internal 
nodes. The internal nodes of the tree correspond to the keys Xi through x„ such that an 
inorder traversal of the nodes visits the keys in order of precedence, i.e., in the order xi, 
X2, . . ., Xn- The external nodes correspond to intervals between the keys, i.e., the j-th 
external node represents the set of elements between Xj-i and Xj. Without ambiguity, 
we identify the nodes of the tree by the corresponding keys. 

For instance, a binary search tree on the set of integers {1, 2, 3, 5, 8, 13, 21} with 
the natural ordering of integers could look like the tree in figure II. 1[ The internal nodes 
of the tree are labeled {1, 2, 3, 5, 8, 13, 21} and the external nodes (leaves) are labeled 
A through H in order. 

Let Tjj for 1 < i < j < n denote a BST on the subset of keys from Xi through Xj. 
We define Tj+i_j to be the unique BST over the empty subset of keys from Xj+i through 
Xi which consists of a single external node with probability of access gj. We will use T 
to denote Ti „. 

A binary search tree with n internal nodes is stored in n locations in memory: each 
memory location contains a key Xi and two pointers to the memory locations containing 
the left and right children of Xj. If the left (resp. right) subtree is empty, then the left 
(resp. right) pointer is NiL. 

In this section, we will restrict our attention to the standard RAM model of compu- 
tation. 




Figure 1.1 A binary search tree over the set |1, 2, 3, 5, 8, 13, 21| 

1.1.1 Searching in a BST 

A search in Tij proceeds recursively as follows. The search argument y is compared 
with the root Xk {i < k < j). li y = Xk, then the search terminates successfully. 
Otherwise, ii y -< Xk (resp. y >- x^), then the search proceeds recursively in the left 
subtree, Ti^^-i (resp. the right subtree, T^+ij); if the left subtree (resp. right subtree) 
of Xk is an external node, i.e., a leaf, then the search fails without visiting any other 
nodes because Xk-i ^ y ^ Xk (resp. Xk ^ y ^ Xk+i)- (We adopt the convention that 
Xq -< y ^ Xi means y -< Xi, and Xn ^ y ^ x„+i means y >- x„.) 

The depth of an internal or external node v is the number of nodes on the path to 
the node from the root, denoted by St{v), or simply 6{v) when the tree T is implicit. 
Hence, for instance, the depth of the root is 1. The cost of a successful or unsuccessful 
search is the number of comparisons needed to determine the outcome. Therefore, the 
cost of a successful search that terminates at some internal node Xi is equal to the depth 
of Xi, i.e., S{xi). The cost of an unsuccessful search that would have terminated at the 
external node Zj is one less than the depth of Zj, i.e., S{zj) — 1. 

So, for instance, the depth of the internal node labeled 8 in the tree of figure 11.11 
is 3. A search for the key 8 would perform three comparisons, with the nodes labeled 
13, 5, and 8, before terminating successfully. Therefore, the cost of a successful search 



that terminates at the node labeled 8 is the same as the path length of the node, i.e., 3. 
On the other hand, a search for the value 4 would perform comparisons with the nodes 
labeled 13, 5, 1, and 3 in that order and then would terminate with failure, for a total of 
four comparisons. This unsuccessful search would have visited the external node labeled 
D] therefore, the cost of a search that terminates at D is one less than the depth of D, 
i.e., 5-1 = 4. 

Even though the external nodes are conceptually present, they are not necessary for 
implementing the BST data structure. If any subtree of an internal node is empty, then 
the pointer to that subtree is assumed to be Nil; it is not necessary to "visit" this empty 
subtree. 

1.1.2 Weighted binary search trees 

In the weighted case, we are also given the probability that the search argument y is 
equal to some key Xj for 1 < i < n and the probability that y lies between Xj and x^+i 
for < j < n. Let pj, for i = 1, 2, . . ., n, denote the probability that y = Xj. Let Qj, for 
j = 0, 1, . . ., n, denote the probability that Xj -< y -< Xj+i. We have 

n n 

J^P. + J^g, = 1- 
i=l j=0 



Define wij as 
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Wij 



k=i k=i—l 

Therefore, Wi_„ = 1, and Wj+i^j = g^. (Note that this definition differs from the function 
w{i,j) referred to by Knuth |Knu73j . Under definition fll.ip . Wij is the sum of the 
probabilities associated with the subtree over the keys Xi through Xj. Under Knuth's 
definition, w{i,j) = Wi+ij is the sum of the probabilities associated with the keys Xj+i 
through Xj.) 

Recall that the cost of a successful search that terminates at the internal node Xi is 
S{xi), and the cost of an unsuccessful search that terminates at the external node Zj is 
S{zj) — 1. We define the cost of T to be the expected cost of a search: 

n n 

cost(r) = Y,Pi ■ M^i) + 5Z?J- ■ i^T{zj) - 1). (1.2) 

i=l j=0 



In other words, the cost of T is the weighted sum of the depths of the internal and 
external nodes of T. 

An optimum binary search tree T* is one with minimum cost. Let T*. denote the 
optimum BST over the subset of keys from Xi through Xj for all i, j such that 1 < 
i <: j <: n; T*^-^ ■ denotes the unique optimum BST consisting of an external node with 
probability of access gj. 

1.2 Why study binary search trees? 

The binary search tree is a fundamental data structure that supports the operations 
of inserting and deleting keys, as well as searching for a key. The straightforward imple- 
mentation of a BST is adequate and efficient for the static case when the probabilities 
of accessing keys are known a priori or can at least be estimated. More complicated 
implementations, such as red-black trees |CLR90j . AVL trees |AVL62t[Knu73j . and splay 
trees |ST85] , guarantee that a sequence of operations, including insertions and deletions, 
can be executed efficiently. 

In addition, the binary search tree also serves as a model for studying the performance 
of algorithms like QUICKSORT |Knu73t[CLR90j . The recursive execution of Quicksort 
corresponds to a binary tree where each node represents a partition of the elements to 
be sorted into left and right parts, consisting of elements that are respectively less than 
and greater than the pivot element. The running time of Quicksort is the sum of the 
work done by the algorithm corresponding to each node of this recursion tree. 

A binary search tree also arises implicitly in the context of binary search. The BST 
corresponding to binary search achieves the theoretical minimum number of comparisons 
that are necessary to search using only key comparisons. 

When an explicit BST is used as a data structure, we want to construct one with 
minimum cost. When studying the performance of QUICKSORT, we want to prove lower 
bounds on the cost and hence the running time. Therefore, the problem of constructing 
optimum BSTs is of considerable interest. 



1.3 Overview 

In chapter O we survey background work on binary search trees and computational 
models for non-uniform memory computers. 

In chapter [3l we give algorithms for constructing optimum binary search trees. In 



section 13. 3[ we consider the most general variant of the HMM model, with an arbitrary 
number of memory levels. We present two dynamic programming algorithms and a top- 
down algorithm to construct optimum BSTs on the HMM. In section 13.41 we consider 
the special case of the HMM model with only two memory levels. For this model, we 
present a dynamic programming algorithm to construct optimum BSTs in section 13.4.11 
and in section I3.4.2[ a linear-time heuristic to construct a EST close to the optimum. 

Finally, we conclude with a summary of our results and a discussion of open problems 
in chapter |H 



CHAPTER 2 

Background and Related Work 



In this chapter, we survey related work on the problem of constructing optimum 
binary search trees, and on computational models for hierarchical memory. In section 
12.11 we discuss the optimum binary search tree problem and related problems. In section 



12.21 we discuss memory effects in modern computers and present arguments for better 
theoretical models. In section I2.2.2[ we survey related work on designing data structures 
and algorithms, and in section 12.2.41 we discuss proposed models of computation for 
hierarchical-memory computers. 

2.1 Binary search trees and related problems 

The binary search tree has been studied extensively in different contexts. In sections 
12.1.11 through 12.1.51 we will summarize previous work on the following related problems 
that have been studied on the RAM model of computation: 

• constructing a binary search tree such that the expected cost of a search is mini- 
mized; 

• constructing an alphabetic tree such that the sum of the weighted path lengths of 
the external nodes is minimized; 

• constructing a prefix-free code tree with no restriction on the lexicographic order 
of the nodes such that the weighted path lengths of all nodes is minimized; 

• constructing a binary search tree close to optimum by an efficient heuristic; 

• constructing an optimal binary decision tree. 



2.1.1 Constructing optimum binary search trees on the RAM 
2.1.1.1 Dynamic programming algorithms 



Theorem 1 (Knuth [KnuTlj . [Knu73j ). An optimum BST can be constructed by a 



dynamic programming algorithm that runs in 0{n^)-time and 0{n^)-space. 

Proof: By the principle of optimality, a binary search tree T* is optimum if and only if 
each subtree of T* is optimum. The standard dynamic programming algorithm proceeds 
as follows: 

Recall that cost{T*j) denotes the cost of an optimum BST T*j over the keys Xj, Xj+i, 
. . ., Xj and the corresponding probabilities Pi, Pi+i, . . ., Pj and qi-i, qi, . . ., Qj. By the 
principle of optimality and the definition of the cost function in equation (II. 2p . 

cost{T*j) = Wij + .min {cost{T*j^_^) + cost{T^^^j)) for i < j 

cost (7^+1 J = Wi+i^i = qi (2.1) 

Recurrence (12.11) suggests a dynamic programming algorithm, algorithm K1 in 



figure Em that constructs optimum subtrees bottom- up. algorithm K1 is the standard 
dynamic programming algorithm. For each d from through n — 1, and for each z, j 
such that j — i = d, the algorithm evaluates the cost of a subtree with Xk as the root, 
for every possible choice of k between i and j, and selects the one for which this cost is 
minimized. 

ALGORITHM Kl constructs arrays c and r, such that c[i,j] is the cost of an optimum 
BST T*j over the subset of keys from Xi through Xj and r[i,j] is the index of the root of 
such an optimum BST. The structure of the tree can be retrieved in 0{n) time from the 
array r at the end of the algorithm as follows. Let T[i,j] denote the optimum subtree 
constructed by algorithm Kl and represented implicitly using the array r. The index 
of the root of this subtree is given by the array entry r[i, j]. Recursively, the left and 
right subtrees of the root are T[i,r[i,j] — 1] and T[r[i, j] + 1, j] respectively. 

For each fixed d and i, the algorithm takes 0{d) time to evaluate the choice of Xk as 
the root for all k such that i<k<j=i + d, and hence, X]d=o X]r=i 0{d) = 0{n^) time 
overall. 

Knuth |Knu71j showed that the following monotonicity principle can be used to reduce 
the time complexity to O^n^): for all i, j, I < i < j < n, let R{i,j) denote the index 
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ALGORITHM Kl([pi..p„], [go..g„]): 

[Initialization phase.) 

[An optimum BST over the empty subset of keys from Xj+i through Xi 

[consists of just the external node with probability qi.) 

[The root of this subtree is undefined.) 

for i := to n 

c[i + l,i] ^Wi+i^i = Qi 

r[i + l,i] ^ Nil 

for d := to n — 1 

for i := 1 to n ~ d 
j ^ i + d 

[Initially, the optimum subtree T*- is unknown.) 

c[i,j] ^ oo 



for k := i to j 

Let T' be the tree with Xk at the root, and T*i^_^ and T'fe+ij- 
as the left and right subtrees, respectively, i.e.. 




[k + l,j]\ 



Let c' be the cost of T': 

c' <— Wij + c[i, A; — 1] + c[k + l,j] 
[Is T' better than the minimum-cost tree so far?) 
if c' < c[i,j] 

r[i,j] ^ k 

c[i,j] ^ c' 



Figure 2.1 algorithm K1 



ALGORITHM K2([pi..p„], [go-.gn 



[Initialization phase.) 
for i := to n 

c[i + l,i] ^ Wi+i^i -- 
r[i + l,i] ^ Nil 

for d := to n — 1 

for i := 1 to n — d 

j ^ i + d 

c[i,j] ^ oo 



for k := r[i,j — 1] to r[i + 1, j] 



Let T' be the tree 




c' ^ Wij + c[i, k - 1]+ c[k + 1, j] 
if c' < c[i,j] 

r[ij] ^ k 

c[i,j] ^ c' 



Figure 2.2 algorithm K2 



of the root of an optimum BST over the keys Xi, Xj+i, . . ., Xj (if more than one root is 
optimum, let R{i,j) be the smallest such index); then 



R{i,j-l)<Rit,j)<R{t + l,j). 



(2.2) 



Therefore, the innermost loop in ALGORITHM Kl can be modified to produce algo- 
rithm K2 (figure 12. 2p with improved running time. 

Since [j — 1) —i = j — (i + l) = c? — 1 whenever j —i = d, the values of r[i, j — 1] and 
r [i + 1, j] are available during the iteration when j — i = d. The number of times that the 
body of the innermost loop in algorithm K2 is executed is r[i + 1, j] — r[i,j — 1] + 1 



when j — i = d. Therefore, the running time of ALGORITHM K2 is proportional to 

n — 1 n—d 
d=0 i=l 

where j = i + d 

n-l 



= ^ {r[n -d+l,n+l]- r[l, d] + n - d) 

d=0 
n-l 

<J2{2n-d) 



d=0 

since r[n — d + l,n + 1] — r[l, d] < {n + 1) — 1 
= 0{n^). 

The use of the monotonicity principle above is in fact an application of the general 
technique due to Yao |Yao82] to speed-up dynamic programming under some special 
conditions. (See subsection 12.1.1.21 below.) 

The space required by both algorithms for the tables r and c is 0(n'^). D 

2.1.1.2 Speed-up in dynamic programming 

For the sake of completeness, we reproduce below results due to Yao |Yao82] . 
Consider a recurrence to compute the value of c(l,n) for the function c() defined by 
the following recurrence 

c{i,j) = w{i,j) + min (c(i. A; — 1) + c{k + 1, j)) for 1 < i < j < n 

c{i + l,t) = qi (2.3) 

where w{) is some function and g^ is a constant for 1 < i < n. The form of the recurrence 
suggests a simple dynamic programming algorithm that computes c{i,j) from c{i, k — 1) 
and c{k + 1, j) for all k from i through j. This algorithm spends 0{j — i) time computing 
the optimum value of c{i,j) for every pair i, j, such that 1 < i < j < n, for a total 
running time of J2^=i Tll=i ^U ~ = 0{jn?). 

The function w{i,j) satisfies the concave quadrangle inequality (QI) if: 

w{i,j)+w{i',j') <w{i',j)+w{i,j') (2.4) 
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for all i, i', j, j' such that i <i' < j < j' ■ In addition, w{i,j) is monotone with respect 
to set inclusion of intervals if w{i,j) < w{k, I) whenever [i,j] C [k, I], i.e., k < i < j < I. 
Let Ck{i,j) denote w{i,j) + c{i, k — 1) + c{k + 1, j) for each k, i < k < j. Let K{i,j) 
denote the maximum k for which the optimum value of c{i,j) is achieved in recurrence 
([23D,i.e., fori< J, 

K{i,j) = max{k \ Ckii.j) = c{ij)} 

Hence, K{i,i) = i. 

Lemma 2 (Yao |Yao82j ) . Ifw{i,j) is monotone and satisfies the concave quadrangle 
inequality 112. 4]) . then the function c{i,j) defined by recurrence 112.3]) also satisfies the 
concave QI, i.e., 

c{i,j) + c{i',j') < c{i',j) + c{i,f) 

for all i, i' , j, j' such that i < i' < j < j'. 

Proof (Mehlhorn [Meh 84] ) : Consider i, i', j, j' such that 1 < i < i' < j < j' < n. 

The proof of the lemma is by induction on I = j' — i- 

Base cases: The case / = is trivial. If / = 1, then either i = i' or j = j', so the 

inequality 

c{z,j) + c{i',f)<c{i',j) + c{i,f) 

is trivially true. 

Inductive step: Consider the two cases: i' = j and i' < j. 

Case 1: i' = j. In this case, the concave QI reduces to the inequality: 

c{i,j) + c{j,j') < c{i,f)+w{j,j). 

Let k = K{i,f). Clearly, i<k< f. 
Case la: A; + 1 < j. 

c{ij) + c{j,j') < w{i,j) + c{i, k-l) + c{k + 1, j) + c{j,j') 
by the definition of c{i,j) 
< w{i,]') + c{i, k-l) + c{k + 1, j) + c{],]') 
by the monotonicity of w() 
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Now if A; + 1 < j, then from the induction hypothesis, c{k + 1, j) + c{j, j') < c{k + 1, j') + 
w{j,j). Therefore, 

c{ij) +c{j,j') < w{i,j') + c{i,k-l) + c{k + l,f)+w{j,j) 
= c{i,j')+w{j,j) 

because k = K{i,j'), and by definition of c{i,j'). 

Case lb: k > j. 

c{t,j) + c{j,f) < c{t,j) + w{j,f) + c{j, k-l) + c{k + 1,/) 
by the definition of c{j,j') 
< c{i,j) + w{i,j') + c{j, k-l) + c{k + 1,/) 
by the monotonicity of wQ 

Now ii k > j, then from the induction hypothesis, c{i,j) + c{j, /c — 1) < c{i, k — l)+w{j,j). 
Therefore, 

c{i,j) + c{j,j') < w{i,j') + c{i, k-l) + w{j,j) + c{k + l,j') 
= c{i,j')+w{j,j) 

by the definition of c{i,j'). 

Case 2: i' < j. Let y = K{i',j) and z = K{i,j'). 
Case 2a: z < y. Note that i < z < y < j. 

cii'j') + c{i,j) = Cy{i',j') + c^{i,j) 

= {w{z',j') + c{i',y- 1) + c^y + 1,/)) + {w{i,]) + c(i, z - I) + c{z + 1, j)) 

< {w{i,j') + w{i\j')) + {c{i\y- 1) + c{i, z-l) + c{z + 1, j) + c{y + 1,/)) 

from the concave QI for w 

< {w{i,j') + w{i',j')) + {c{i\y- 1) + c(2, z-l) + c{y+ 1, j) + c(z + 1,/)) 

from the induction hypothesis, 
i.e., the concave QI apphed to z < y < j < j' 
= c{i,j') + c{i',j) 

by definition of c{i,j') and c(i' ,j). 

Case 2b: y < z. This case is symmetric to case 2a above. D 
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Theorem 3 (Yao |Yao82] ) . If the function w{i,j) is monotone and satisfies the con- 
cave quadrangle inequahty, then 

K{t,j-l)<K{z,j)<K{t + l,j). 

Proof (Mehlhorn [Meh84] ) : The theorem is trivially true when j = i + 1 because 
i < K{i,j) < j. We will prove K{i,j — 1) < K{i,j) for the case i < j — l,hj induction 
on j — i. 

Recall that K{i,j — 1) is the largest index k that achieves the minimum value of 
c{i,j — 1) = w{i,j — 1) + c(z, k — 1) + c{k + 1, j — 1) (cf. equation (12.31) ). Therefore, it 
suffices to show that 

Cfc/(i,j -1) < Cfc(z,j - 1) =^ Ck'iiJ) <Ckii,j) 

for all i < k < k' < j. We prove the stronger inequality 

CkiiJ - 1) - Ck'ii,j - 1) < CkiiJ) - Ck'iiJ) 

which is equivalent to 

Ckii,j - 1) +Ck'ii,j) < Ck'{i,i - l) + Ck{i,i). 

The last inequality above is expanded to 

c{i, k-l)+c{k + 1, j - 1) + c(i, /c' - 1) + c{k' + 1, j) 
< c{i, k' -l) + c{k' + 1, J - 1) + c{i, k-l) + c{k + 1, j) 

or 

c{k + 1, J - 1) + c{k' + 1, j) < c{k' + 1, J - 1) + c{k + 1, j). 

But this is simply the concave quadrangle inequality for the function c{i,j) for k < k' < 
j — 1 < j, which is true by the induction hypothesis. D 

As a consequence of theorem [3l if we compute c{i,j) by diagonals, in order of increas- 
ing values of j — i, then we can limit our search for the optimum value of k to the range 
from K{i,j — 1) through K{i — 1, j). The cost of computing all entries on one diagonal 
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where j = i + d is 



n—d 



Y^[K{i + l,j)~K{i,j -l) + l) 



i=l 

= K{n - d + 1, n + 1) - K{1, d) + n - d 

<{n + l)-l + {n-d) 

<2n. 

The speed-up technique in this section is used to improve the running time of the 
standard dynamic programming algorithm to compute optimum BSTs. It is easy to see 
that the parameters of the optimum BST problem satisfy the conditions required by 
Theorem [31 

2.1.2 Alphabetic trees 

The special case of the problem of constructing an optimum BST when pi = P2 = 
■ ■ ■ = Pn = is known as the alphabetic tree problem. This problem arises in the context 
of constructing optimum binary code trees. A binary codeword is a string of O's and I's. 
A prefix-free binary code is a sequence of binary codewords such that no codeword is a 
prefix of another. Corresponding to a prefix-free code with n + 1 codewords, there is a 
rooted binary tree with n internal nodes and n + 1 external nodes where the codewords 
correspond to the external nodes of the tree. 

In the alphabetic tree problem, we require that the codewords at the external nodes 
appear in order from left to right. Taking the left branch of the tree stands for a bit 
and taking the right branch stands for a 1 bit in the codeword; thus, a path in the tree 
from the root to the j-th external node represents the bits in the j-th codeword. This 
method of coding preserves the lexicographic order of messages. The probability Qj of 
the j-th codeword is the likelihood that the symbol corresponding to that codeword will 
appear in any message. Thus, in this problem, Pi = P2 = ■ ■ ■ = Pn = ^ and X]?=o 1j — ^■ 

Hu and Tucker |HT71j developed a two-phase algorithm that constructs an optimum 
alphabetic tree. In the first phase, starting with a sequence of n -|- 1 nodes, pairs of nodes 
are recursively combined into a single tree to obtain an assignment of level numbers to 
the nodes. The tree constructed in the first phase does not necessarily have the leaves in 
order. In the second phase, the nodes are recombined into a tree where the nodes are now 
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in lexicographic order and the depth of a node is the same as the level number assigned 
to it in the first phase. It is non-trivial to prove that there exists an optimum alphabetic 
tree with the external nodes at the same depths as the level numbers constructed in the 
first phase. 

The algorithm uses a priority queue with at most n + 1 elements on which it performs 
0{n) operations. With the appropriate implementation, such as a leftist tree [Knu73j or 
a Fibonacci heap |CLR90] . the algorithm requires 0{n\ogn) time and 0{n) space. 

2.1.3 Huffman trees 

If we relax the condition in the alphabetic tree problem that the codewords should 
be in lexicographic order, then the problem of constructing an optimum prefix-free code 
is the Huffman tree problem. Huffman's classic result |Huf52j is that a simple greedy 
algorithm, running in time O(nlogn), suffices to construct a minimum-cost code tree. 

2.1.4 Nearly optimum search trees 



The best known algorithm, ALGORITHM K2 due to Knuth |Knu71j . to construct an 



optimum search tree requires 0{n^) time and space (Theorem [T]). If we are willing to 
sacrifice optimality for efficiency, then we can use a simple linear-time heuristic due to 
Mehlhorn |Meh84] to construct a tree T that is not too far from optimum. In fact, if T* 
is a tree with minimum cost, then 

cost(T) - cost(T*) < Ig (cost(T*)) ^\gH 

where H = J2^=iPi lg(l/Pi) + J2^=o 1j ^&i^/lj) is the entropy of the probability distribu- 
tion. 

2.1.5 Optimal binary decision trees 

We remark that the related problem of constructing an optimal binary decision tree is 
known to be NP-complete. Hyafil and Rivest [HR76J proved that the following problem 
is NP-hard: 

Problem 4. Let S = {si, $2, ■ ■ ., s„} be a finite set of objects and let T = {ti, t2, ■ ■ ., 
tm} be a finite set of tests. For each test ti and object Xj, 1 < i < ni and I < j < n, we 
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have either ti{xj) = True or ti{xj) = False. Construct an identification procedure for 
the objects in S such that the expected number of tests required to completely identify 
an element of S is minimal. In other words, construct a binary decision tree with the 
tests at the internal nodes and the objects in S at the external nodes, such that the sum 
of the path lengths of the external nodes is minimized. 

The authors showed, via a reduction from Exact Cover by 3-Sets (X3C) [GJ79] . 
that the optimal binary decision tree problem remains NP-hard even when the tests are 
all subsets of S of size 3 and ti{xj) = True if and only if Xj is an element of set tj. 

For more details on the optimum binary search tree problem and related problems, 



we refer the reader to the excellent survey article by S. V. Nagaraj Nag97 



2.2 Models of computation 



The Random Access Machine (RAM) |Pap95 , rBC94j is used most often in the design 
and analysis of algorithms. 

2.2.1 The need for an alternative to the RAM model 

The RAM is a sequential model of computation. It consists of a single processor with a 
predetermined set of instructions. Different variants of the RAM model assume different 
instruction sets — for instance, the real RAM |PS85j can perform exact arithmetic on real 
numbers. See also Louis Mak's Ph.D. thesis |Mak95] . 

In the RAM model, memory is organized as a potentially unbounded array of loca- 
tions, numbered 1, 2, 3, . . ., each of which can store an arbitrarily large integer value. 
On the RAM, the memory organization is uniform; i.e., it takes the same amount of time 
to access any location in memory. 

While the RAM model serves to approximate a real computer fairly well, in some 
cases, it has been observed empirically that algorithms (and data structures) behave 
much worse than predicted on the RAM model: their running times are substantially 
larger than what even a careful analysis on the RAM model would predict because of 
memory effects such as paging and caching. In the following subsections, we review 
the hierarchical memory organization of modern computers, and how it leads to memory 
effects so that the cost of accessing memory becomes a significant part of the total running 
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time of an algorithm. We survey empirical observations of these memory effects, and the 
study of data structures and algorithms that attempt to overcome bottlenecks due to 
slow memory. 

2.2.1.1 Modern computer organization 

Modern computers have a hierarchical memory organization |HP96j . Memory is or- 
ganized into levels such as the processor's registers, the cache (primary and secondary), 
main memory, secondary storage, and even distributed memory. 

The first few levels of the memory hierarchy comprising the CPU registers, cache, and 
main memory are realized in silicon components, i.e., hardware devices such as integrated 
circuits. This type of fast memory is called "internal" storage, while the slower magnetic 
disks, CD-ROMs, and tapes used for realizing secondary and tertiary storage comprise 
the "external" storage. 

Registers have the smallest access time, and magnetic disks and tapes are the slowest. 
Typically, the memory in one level is an order of magnitude faster than in the next level. 
So, for instance, access times for registers and cache memory are a few nanoseconds, 
while accessing main memory takes tens of nanoseconds. 

The sizes (numbers of memory locations) of the levels also increase by an order of 
magnitude from one level to the next. So, for instance, typical cache sizes are measured 
in kilobytes while main memory sizes are of the order of megabytes and larger. The 
reason for these differences is that faster memory is more expensive to manufacture and 
therefore is available in smaller quantities. 

Most multi-programmed systems allow the simultaneous execution of programs in a 
time-sharing fashion even when the sum of the memory requirements of the programs 
exceeds the amount of physical main memory available. Such systems implement virtual 
memory: not all data items referenced by a program need to reside in main memory. 
The virtual address space, which is much larger than the real address space, is usually 
partitioned into pages. Pages can reside either in main memory or on disk. When the 
processor references an address belonging to a page not currently in the main memory, 
the page must be loaded from disk into main memory. Therefore, the time to access a 
memory location also depends on whether the corresponding page of virtual memory is 
currently in main memory. 
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Consequently, the memory organization is highly non-uniform, and the assumption 
of uniform memory cost in the RAM model is unrealistic. 

2.2.1.2 Locality of reference 

Many algorithms exhibit the phenomenon of spatial and temporal locality |Smi82] . 
Data items are accessed in regular patterns so that the next item to be accessed is very 
likely to be one that is stored close to the last few items accessed. This phenomenon 
is called spatial locality. It occurs because data items that are logically "close" to each 
other also tend to be stored close together in memory. For instance, an array is a typical 
data structure used to represent a list of related items of the same type. Consecutive 
array elements are also stored in adjacent memory locations. (See, however, Chatterjee 
et al. |CJLM99] for a study of the advantage of a nonlinear layout of arrays in memory. 
Also, architectures with interleaved memory store consecutive array elements on different 
memory devices to facilitate parallel or pipelined access to a block of addresses.) 

A data item that is accessed at any time is likely to be accessed again in the near 
future. For example, the index variable in a loop is probably also used in the body of the 
loop. Therefore, during the execution of the loop, the variable is accessed several times 
in quick succession. This is the phenomenon of temporal locality. 

In addition, the hardware architecture mandates that the processor can operate only 
on data present in its registers. Therefore, executing an operation requires extra time 
to move the operands into registers and store the result back to free up the registers 
for the next operation. Typically, data can be moved only between adjacent levels in 
the memory hierarchy, such as between the registers and the primary cache, cache and 
main memory, and the main memory and secondary storage, but not directly between 
the registers and secondary storage. 

Therefore, an algorithm designer must make efficient use of available memory, so that 
data is available in the fastest possible memory location whenever it is required. Of 
course, moving data around involves extra overhead. The memory allocation problem is 
complicated by the dynamic nature of many algorithms. 



2.2.1.3 Memory effects 

The effects of caches on the performance of algorithms have been observed in a number 
of contexts. Smith |Smi82] presented a large number of empirical results obtained by 
simulating the data access patterns of real programs on different cache architectures. 
LaMarca and Ladner |LL99j investigated the effect of caches on the performance of 
sorting algorithms, both experimentally and analytically. The authors showed how to 
restructure MergeSort, QuickSort, and HeapSort to improve the utilization of the 
cache and reduce the execution time of these algorithms. Their theoretical prediction of 
cache misses incurred closely matches the empirically observed performance. 

LaMarca and Ladner |LL96] also investigated empirically the performance of heap 
implementations on different architectures. They presented optimizations to reduce the 
cache misses incurred by heaps and gave empirical data about how their optimizations 
affected overall performance on a number of different architectures. 

The performance of several algorithms such as matrix transpositions and FFT on 
the virtual memory model was studied by Aggarwal and Chandra |AC88] . The authors 
modeled virtual memory as a large flat address-space which is partitioned into blocks. 
Each block of virtual memory is mapped into a block of real memory. A block of memory 
must be loaded into real memory before it can be accessed. The authors showed that 
some algorithms must still run slowly even if the algorithms were able to predict memory 
accesses in advance. 

2.2.1.4 Complexity of communication 

Algorithms that operate on large data sets spend a substantial amount of time ac- 
cessing data (reading from and writing to memory). Consequently, memory access time 
(also referred to in the literature as I/O- or communication-time) frequently dominates 
the computation time. Therefore, the RAM model, which does not account for memory 
effects, is inadequate for accurately predicting the performance of such algorithms. 

Depending on the machine organization, either the time to compute results or the 
time to read/write data may dominate the running time of an algorithm. A computation 
graph represents the dependency relationship between data items — there is a directed 
edge from vertex u to vertex v if the operation that computes the value at v requires 
that the value at u be already available. For computation on a collection of values whose 
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dependencies form a grid graph, the tradeoff between the computation time and memory 
access time was quantified by Papadimitriou and Ullman |PU87] . 

The I/0-complexity of an algorithm is the cost of inputs and outputs between faster 
internal memory and slower secondary memory. Aggarwal and Vitter |AV88j proved 
tight upper and lower bounds for the I/0-complexity of sorting, computing the FFT, 
permuting, and matrix transposition. Hong and Kung |HK8lj introduced an abstract 
model of pebbling a computation graph to analyze the I/0-complexity of algorithms. The 
vertices of the graph that hold pebbles represent data items that are loaded into main 
memory. With a limited number of pebbles available, the number of moves needed to 
transfer all the pebbles from the input vertices to the output vertices of the computation 
graph is the number of I/O operations between main memory and external memory. 

Interprocessor communication is a significant bottleneck in multiprocessor architec- 
tures, and it becomes more severe as the number of processors increases. In fact, depend- 
ing on the degree of parallelism of the problem itself, the communication time between 
processors frequently limits the execution speed. Aggarwal et al. |ACS90j proposed the 
LPRAM model for parallel random access machines that incorporates both the compu- 
tational power and communication delay of parallel architectures. For this model, they 
proved upper bounds on both computation time and communication steps using p proces- 
sors for a number of algorithms, including matrix multiplication, sorting, and computing 
an n-point FFT. 

2.2.2 External memory algorithms 

Vitter [Yit] surveyed the state of the art in the design and analysis of data structures 
and algorithms that operate on data sets that are too large to fit in main memory. 
These algorithms try to reduce the performance bottleneck of accesses to slower external 
memory. 

There has been considerable interest in the area of I/0-efiicient algorithms for a long 
time. Knuth jKnu73] investigated sorting algorithms that work on files that are too large 
to fit in fast internal memory. For example, when the file to be sorted is stored on a 
sequential tape, a process of loading blocks of records into internal memory where they 
are sorted and using the tape to merge the sorted blocks turns out quite naturally to be 
more efficient than running a sorting algorithm on the entire file. 
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Grossman and Silverman |GS73j considered the very general problem of storing records 
on a secondary storage device to minimize expected retrieval time, when the probability 
of accessing any record is known in advance. The authors model the pattern of accesses 
by means of a parameter that characterizes the degree to which the accesses are sequential 
in nature. 

There has been interest in the numerical computing field in improving the performance 
of algorithms that operate on large matrices [ CSj . A successful strategy is to partition the 
matrix into rectangular blocks, each block small enough to fit entirely in main memory 
or cache, and operate on the blocks independently. 



The same blocking strategy has been employed for graph algorithms [ABCP981 ICGG^95 



INGV96] . The idea is to cover an input graph with subgraphs; each subgraph is a small 
diameter neighborhood of vertices just big enough to fit in main memory. A computation 
on the entire graph can be performed by loading each neighborhood subgraph into main 
memory in turn, computing the final results for all vertices in the subgraph, and storing 
back the results. 

Gil and Itai |GI99j studied the problem of storing a binary tree in a virtual memory 
system to minimize the number of page faults. They considered the problem of allocating 
the nodes of a given binary tree (not necessarily a search tree) to virtual memory pages, 
called a packing, to optimize the cache performance for some pattern of accesses to 
the tree nodes. The authors investigated the specific model for tree accesses in which 
a node is accessed only via the path from the root to that node. They presented a 
dynamic programming algorithm to find a packing that minimizes the number of page 
faults incurred and the number of different pages visited while accessing a node. In 
addition, the authors proved that the problem of finding an optimal packing that also 
uses the minimum number of pages in NP-complete, but they presented an efficient 
approximation algorithm. 

2.2.3 Non- uniform memory architecture 

In a non-uniform memory architecture (NUMA), each processor contains a portion 
of the shared memory, so access times to different parts of the shared address space can 
vary, sometimes significantly. 



21 



NUMA architectures have been proposed for large-scale multiprocessor computers. 
For instance, Wilson |Wil87j proposed an architecture with hierarchies of shared buses 
and caches. The author proposed extensions of cache coherency protocols to maintain 
cache coherency in this model and presented simulations to demonstrate that a 128 
processor computer could be constructed using this architecture that would achieve a 
substantial fraction of its peak performance. 

A related architecture proposed by Hagersten et al. |HLH92] . called the Cache-Only 
Memory Architecture (COMA), is similar to a NUMA in the sense that each processor 
holds a portion of the shared address space. In the COMA, however, the allocation of 
the shared address space among the processors can be dynamic. All of the distributed 
memory is organized like large caches. The cache belonging to each processor serves two 
purposes — it caches the recently accessed data for the processor itself and also contains 
a portion of the shared memory. A coherence protocol is used to manage the caches. 

2.2.4 Models for non- uniform memory 

One motivation for a better model of computation is the desire to model real com- 
puters more accurately. We want to to be able to design and analyze algorithms, predict 
their performance, and characterize the hardness of problems. Consequently, we want 
a simple, elegant model that provides a faithful abstraction of an actual computer. Be- 
low, we survey the theoretical models of computation that have been proposed to model 
memory effects in actual computers. 

The seminal paper by Aggarwal et al. |AACS87] introduced the Hierarchical Memory 
Model (HMM) of computation with logarithmic memory access cost, i.e., access to the 
memory location at address a takes time 0(loga). The HMM model seems realistic 
enough to model a computer with multiple levels in the memory hierarchy. It confirms 
with our intuition that successive levels in memory become slower but bigger. Standard 
polynomial-time RAM algorithms can run on this HMM model with an extra factor of 
at most 0(\ogn) in the running time. The authors showed that some algorithms can be 
rewritten to reduce this factor by taking advantage of locality of reference, while other 
algorithms cannot be improved asymptotically. 

Aggarwal et al. |ACS87] proposed the Hierarchical Memory model with Block Transfer 
(HMBT) as a better model that incorporates the cost of data transfer between levels in 
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the memory hierarchy. The HMBT model allows data to be transferred between levels 
in blocks in a pipelined manner, so that it takes only constant time per unit of memory 
after the initial item in the block. The authors considered variants of the model with 
different memory access costs: /(a) = log a, /(a) = a^ for < /9 < 1, and /(a) = a. 

Aggarwal and Chandra |AC88j proposed a model VMj for a computer with virtual 
memory. The virtual memory on the VMj model consists of a hierarchical partitioning 
of memory into contiguous intervals or blocks. Some subset of the blocks at any level are 
stored in faster (real) memory at any time. The blocks and sub-blocks of virtual memory 
are used to model disk blocks, pages of real memory, cache lines, etc. The authors' model 
for the real memory is the HMBT model BTf in which blocks of real memory can be 
transferred between memory levels in unit time per location after the initial access, i.e., 
in a pipelined manner. The VMf is considered a higher-level abstraction on which to 
analyze application programs, while the running time is determined by the time taken by 
the underlying block transfers. In both the models considered, the VMf and the BTf, 
the parameter / is a memory cost function representing the cost of accessing a location 
in real or virtual memory. 

The Uniform Memory Hierarchy (UMH) model of computation proposed by Alpern 
et al. |ACFS94] incorporates a number of parameters that model the hierarchical nature 
of computer memory. Like the HMBT, the UMH model allows data transfers between 
successive memory levels via a bus. The transfer cost along a bus is parameterized by 
the bandwidth of the bus. Other parameters include the size of a block and the number 
of blocks in each level of memory. 

Regan Reg96| introduced the Block Move (BM) model of computation that extended 



the ideas of the HMBT model proposed by Aggarwal et al. |ACS87j . The BM model 
allows more complex operations such as shuffling and reversing of blocks of memory, as 
well as the ability to apply other finite transductions besides "copy" to a block of memory. 
The memory-access cost of a block transfer, similar to that in the HMBT model, is unit 
cost per location after the initial access. Regan proved that different variants of the model 
are equivalent up to constant factors in the memory-access cost. He studied complexity 
classes for the BM model and compared them with standard complexity classes defined 
for the RAM and the Turing machine. 
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Two extensions of the HMBT model, the Parallel HMBT (P-HMBT) and the pipelined 
P-HMBT (PP-HMBT), were investigated by Juurlink and Wijshoff |,TW94] . In these 
models, data transfers between memory levels may proceed concurrently. The authors 
proved tight bounds on the total running time of several problems on the P-HMBT 
model with access cost function f{a) = [logaj. The P-HMBT model is identical to the 
HMBT model except that block transfers of data are allowed to proceed in parallel be- 
tween memory levels, and a transfer can take place only between successive levels. In the 
PP-HMBT model, different block transfers involving the same memory location can be 
pipelined. The authors showed that the P-HMBT and HMBT models are incomparable 
in strength, in the sense that there are problems that can be solved faster on one model 
than on the other; however, the PP-HMBT model is strictly more powerful than both 
the HMBT and the P-HMBT models. 

A number of models have also been proposed for parallel computers with hierarchical 
memory. 

Valiant |Val89j proposed the Bulk-Synchronous Parallel (BSP) model as an abstract 
model for designing and analyzing parallel programs. The BSP model consists of com- 
ponents that perform computation and memory access tasks and a router that delivers 
messages point-to-point between the components. There is a facility to synchronize all 
or a subset of components at the end of each superstep. The model emphasizes the sep- 
aration of the task of computation and the task of communicating between components. 
The purpose of the router is to implement access by the components to shared memory 
in parallel. In |Val90] . Valiant argues that the BSP model can be implemented efficiently 
in hardware, and therefore, it serves as both an abstract model for designing, analyzing 
and implementing algorithms as well as a realistic architecture realizable in hardware. 

Culler et al. jCKP"'"96] proposed the LogP model of a distributed-memory multi- 
processor machine in which processors communicate by point-to-point messages. The 
performance characteristics of the interconnection network are modeled by four parame- 
ters L, 0, g, and P: L is the latency incurred in transmitting a message over the network, 
is the overhead during which the processor is busy transmitting or receiving a mes- 
sage, g is the minimum gap (time interval) between consecutive message transmissions 
or reception by a processor, and P is the number of processors or memory modules. The 
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LogP model does not model local architectural features, such as caches and pipelines, 
at each processor. 

For a comprehensive discussion of computational models, including models for hier- 
archical memory, we refer the reader to the book by Savage jSav98j . 

For the rest of this thesis, we focus on a generalization of the HMM model due to 
Aggarwal et al. |AACS87] where the memory cost function can be an arbitrary nonde- 
creasing function, not just logarithmic. 

Now that we have a more realistic model of computation, our next goal is to re-analyze 
existing algorithms and data structures, and either prove that they are still efficient in this 
new model or design better ones. Also, in the cases where we observe worse performance 
on the new model, we would also like to be able to prove nontrivial lower bounds. This 
leads to our primary interest in this thesis, which studies the problem of constructing 
minimum-cost binary search trees on a hierarchical memory model of computation. 
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CHAPTER 3 

Algorithms for Constructing Optimum and Nearly 
Optimum Binary Search Trees 



3.1 The HMM model 

Our version of the HMM model of computation consists of a single processor with 
a potentially unbounded number of memory locations with addresses 1, 2, 3, .... We 
identify a memory location by its address. A location in memory can store a finite but 
arbitrarily large integer value. 

The processor can execute any instruction in constant time, not counting the time 
spent reading from or writing to memory. Some instructions read operands from memory 
or write results into the memory. Such instructions can address any memory location 
directly by its address; this is called "random access" to memory, as opposed to sequential 
access. At most one memory location can be accessed at a time. The time taken to read 
and write a memory location is the same. 

The HMM is controlled by a program consisting of a finite sequence of instructions. 
The state of the HMM is defined by the sequence number of the current instruction and 
the contents of memory. 

In the initial state, the processor is just about to execute the first instruction in its 
program. If the length of the binary representation of the input is n, then memory 
locations 1 through n contain the input, and all memory locations at higher addresses 
contain zeros. The program is not stored in memory but encoded in the processor's finite 
control. 

The memory organization of the HMM model is dramatically different from that of 
the RAM. On the HMM, accessing different memory locations may take different amounts 
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of time. Memory is organized in a hierarchy, from fastest to slowest. Within each level 
of the hierarchy, the cost of accessing a memory location is the same. 

More precisely, the memory of the HMM is organized into a hierarchy Mi, M2, . . ., 
M/i with h different levels, where Mi denotes the set of memory locations in level / for 
1 < I < h. Let iTLi = \Mi\ be the number of memory locations in M;. The time to 
access every location in Mi is the same. Let q be the time taken to access a single 
memory location in Mi. Without loss of generality, the levels in the memory hierarchy 
are organized from fastest to slowest, so that Ci < C2 < ■ ■ ■ < Ch- We will refer to the 
memory locations with the lowest cost of access, Ci, as the "cheapest" memory locations. 

For an HMM, we define a memory cost function /i : N ^ N that gives the cost fi{a) 
of a single access to the memory location at address a. The function /i is defined by the 
following increasing step function: 



^{a) 



C\ for < a < vrtx 

C2 for vrtx < a < mi + 1712 

C3 for iTii + 1712 < a < iTii + m2 + m^ 

^ Ch for Yl'iZl mi <a< Y!i=i ^i- 



We do not make any assumptions about the relative sizes of the levels in the hierarchy, 
although we expect that rrii < 1712 < . . . < rrih in an actual computer. 

A memory configuration with s locations is a sequence Cg = {ni \ 1 < I < h) where 
each rii is the number of memory locations from level / in the memory hierarchy and 

Eh 
1=1 ni = s. 

The running time of a program on the HMM model consists of the time taken by 
the processor to execute the instructions according to the program and the time taken 
to access memory. Clearly, if even the fastest memory on the HMM is slower than the 
uniform-cost memory on the RAM, then the same program will take longer on the HMM 
than on the RAM. Assume that the RAM memory is unit cost per access, and that 
1 < Ci < C2 < ■ ■ ■ < Ch- Then, the running time of an algorithm on the HMM will be at 
most Ch times that on the RAM. An interesting question is whether the algorithm can 
be redesigned to take advantage of locality of reference so that its running time on the 
HMM is less than Ch times the running time on the RAM. 
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3.2 The HMM2 model 

The Hierarchical Memory Model with two memory levels (HMM2) is the special 
case of the general HMM model with h = 2. In the HMM2, memory is organized in a 
hierarchy consisting of only two levels, denoted hj Aii and 7Vl2- There are rrii locations 
in A^i and 1712 locations in A^2- The total number of memory locations is rrii + 1712 = n. 
A single access to any location in TWi takes time Ci, and an access to any location in 
A^2 takes time C2, with Ci < C2. We will refer to the memory locations in A^i as the 
"cheaper" or "less expensive" locations. 

3.3 Optimum BSTs on the HMM model 

We study the following problem for the HMM model with n memory locations and 
an arbitrary memory cost function /i : {1, 2, . . ., n} -^ N. 

Problem 5. [Constructing an optimum BST on the HMM] Suppose we are given 
a set of n keys, xi, X2, ■ ■ ., Xn in order, tlie probabilities pi for 1 < i < n tliat a searcli 
argument y equals xi, and the probabilities qj for < j < n that Xj-i ^ y ^ Xj. The 
problem is to construct a binary search tree T over the set of keys and compute a memory 
assignment function : V(T) -^ {1, 2, ..., n} that assigns the (internal) nodes ofT to 
memory locations such that the expected cost of a search is minimized. 

Let (T, 0) denote a potential solution to the above problem: T is the combinatorial 
structure of the tree, and the memory assignment function maps the internal nodes of 
T to memory locations. 

If V is an internal node of T, then 0(f) is the address of the memory location where 
V is stored, and /i(0(f )) is the cost of a single access to v. If v stores the key Xi, then 
we will sometimes write (f){xi) for 0(f). On the other hand, if v is an external node 
of T, then such a node does not actually exist in the tree; however, it does contribute 
to the probability that its parent node is accessed. Therefore, for an external node 
V, we use 0(f) to denote the memory location where the parent of v is stored. Let 
Ty denote the subtree of T rooted at v. Now T^ is a binary search tree over some 
subset, say {xt, Xj+i, . . ., Xj}, of keys; let w{T^) denote the sum of the corresponding 
probabilities: w{Ty) = Wij = "^i^iPk + X]fc=j-i Ik- (If v is the external node Zj, we use 
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the convention that f is a subtree over the empty set of keys from Xj+i through Xj, and 
w{Ty) = Wj-^ij = qj.) Therefore, w{Ty) is the probabihty that the search for a key in T 
proceeds anywhere in the subtree T„. 

On the HMM model, making a single comparison of the search argument y with 
the key Xi incurs, in addition to the constant computation time, a cost of fi{(f){xi)) for 
accessing the memory location where the corresponding node of T is stored. By the cost 
of (T, (p), we mean the expected cost of a search: 

n n 

cost((T,0)) = ^w;(r,J ■/i(0(x,)) + ^«;(T,J ■/i(0(2;,)) (3.1) 

i=l i=o 

where the first summation is over all n internal nodes Xj of T and the second summation 
is over the n + 1 external nodes Zj. 

Here is another way to derive the above formula — the search algorithm accesses the 
node V whenever the search proceeds anywhere in the subtree rooted at f , and the 
probability of this event is precisely w{T.^) = Wij. The contribution of the node v to the 
total cost is the probability w{T^) of accessing v times the cost fi{(f){v)) of a single access 
to the memory location containing v. 

The pair (T*, 0*) is an optimum solution to an instance of problem [S] if cost((T*, 0*)) 
is minimum over all binary search trees T and functions assigning the nodes of T to 
memory locations. We show below in Lemma [7] that for a given tree T there is a unique 
function that optimally assigns nodes of T to memory locations. 

It is easy to see that on the standard RAM model where every memory access takes 
unit time, equation (13. ip is equivalent to equation (11. 2p . Each node v contributes once 
to the sum on the right side of (13. ip for each of its ancestors in T. 

3.3.1 Storing a tree in memory optimally 

The following lemmas show that the problem of constructing optimum BSTs specifi- 
cally on the HMM model is interesting because of the interplay between the two parameters- 
the combinatorial structure of the tree and the memory assignment; restricted versions 
of the general problem have simple solutions. 

Consider the following restriction of problem with the combinatorial structure of 
the EST T fixed. 
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Problem 6. Given a binary search tree T over the set of keys Xi through x„, compute 
an optimum memory assignment function cf) : V{T) -^ {1, 2, . . ., n} that assigns the 
nodes of T to memory locations such that the expected cost of a search is minimized. 

Let 7i{v) denote the parent of the node v in T; if v is the root, then let tt{v) = v. 
Let 0* denote an optimum memory assignment function that assigns the nodes of T to 
locations in memory. 

Lemma 7. With T fixed, for every node v of T, 

/x(r(7r(i;))) < /i(r(t;)). 

In other words, for a fixed BST T, there exists an optimal memory assignment function 
that assigns every node of T to a memory location that is no more expensive than the 
memory locations assigned to its children. 

Proof: Assume to the contrary that for a particular node v, we have fi{(j)* [n (v))) > 
fi{(f)*{v)). The contribution of v and 7r(f ) to the total cost of the tree in the summation 

dSIDis 

w;(T^(„))/i(0*(7r(t;))) + ti;(T„)/i(0*(w)). 

The node 7t{v) is accessed whenever the search proceeds anywhere in the subtree 
rooted at 7r(t>), and likewise with v. Since each Pi,qj > 0, n{v) is accessed at least as 
often as v, i.e., ti;(T^(t,)) > wiT^). 

Therefore, since /x(0*(t')) < /u(0*(7r(f ))) by our assumption, 

«7(r,(,))^(0*(t;)) +t/7(T,)/i(0*(7r(t;))) < w7(T^(,))/i(0*(7r(t;))) + w;(r,)/i(0*(t;)) 

so that we can swap the memory locations where v and its parent tt{v) are stored and 
not increase the cost of the solution. D 

As a consequence, the root of any subtree is stored in the cheapest memory location 
among all nodes in that subtree. 

Lemma 8. For fixed T, the optimum memory assignment function, cf)* , can be deter- 
mined by a greedy algorithm. The running time of this greedy algorithm is O(nlogn) 
on the RAM. 
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Proof: It follows from Lemma [7] that under some optimum memory assignment, the 
root of the tree must be assigned the cheapest available memory location. Again from 
the same lemma, the next cheapest available location can be assigned only to one of the 
children of the root, and so on. The following algorithm implements this greedy strategy. 

By the weight of a node v in the tree, we mean the sum of the probabilities of all 
nodes in the subtree rooted at v, i.e., w{T^). The value w{T^) can be computed for every 
subtree T„ in linear time and stored at v. We maintain the set of candidates for the 
next cheapest location in a heap ordered by their weights. Among all candidates, the 
optimum choice is to assign the cheapest location to the heaviest vertex. We extract this 
vertex, say u, from the top of the heap, store it in the next available memory location, 
and insert the two children of u into the heap. Initially, the heap contains just the root 
of the entire tree, and the algorithm continues until the heap is empty. 

This algorithm performs n insertions and n deletions on a heap containing at most n 
elements. Therefore, its running time on the uniform-cost RAM model is 0{n logn). D 

3.3.2 Constructing an optimum tree when the memory assign- 
ment is fixed 

Consider the following restriction of problem O where the memory assignment function 
is given. 

Problem 9. Suppose each of the keys Xi, for 1 < i < n, is assigned a priori a fixed 
location (f){xi) in memory. Compute the structure of a binary search tree of minimum 
cost where every node Vi of the tree corresponding to key Xi is stored in memory location 

(j){Xi). 

Lemma 10. Given a fixed assignment of keys to memory locations, i.e., a function 
from the set of keys (equivalently, the set of nodes of any BST T) to the set of memory 
locations, the BST T* of minimum cost can be constructed by a dynamic programming 
algorithm. The running time of this algorithm is 0{n^) on the RAM. 

Proof: The principle of optimality clearly applies here so that a BST is optimum if and 
only if each subtree is optimum. The standard dynamic programming algorithm proceeds 
as follows: 
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Let cost (7^*^) denote the cost of an optimum BST over the keys Xi, Xj+i, . . ., Xj and 
the corresponding probabihties pi, Pi+i, ■ ■ ■, Pj and Qi-i, qi, . . ., qj, given the fixed memory- 
assignment (J). By the principle of optimahty, 

cost{T*j) = Wij ■ /i((/>(xfe)) + .min {cost{T*f^^^) + cost{T*^^-)) for i < j 

cost{T*^-^-) = Wi+i^i = qi. (3.2) 

Recall that Wij is the probability that the root of this subtree is accessed, and fi{(f){xk)) 
is the cost of a single access to the memory location (f){xk) where Xk is stored. 

Notice that this expression is equivalent to equation fl2.ll) except for the multiplica- 
tive factor fi{(f){xk)). Therefore, ALGORITHM Kl from section 12.1.1.11 can be used to 
construct the optimum binary search tree efficiently, given an assignment of keys to 
memory locations. D 

In general, it does not seem possible to use a monotonicity principle to reduce the 
running time to O(n^), as in ALGORITHM K2 of section [2". l.l.li 

3.3.3 Naive algorithm 

A naive algorithm for problem [5] is to try every possible mapping of keys to memory 
locations. Lemma [TO] guarantees that we can then use dynamic programming to construct 
an optimum binary search tree for that memory assignment. We select the minimum-cost 
tree over all possible memory assignment functions. 

There are 

(n 

such mappings from n keys to n memory locations with rrii of the first type, m2 of the 
second type, and so on. The multinomial coefficient is maximized when rrii = 1112 = ■ ■ ■ = 
mh_i = \_n/h\. The dynamic programming algorithm takes 0{n^) time to compute the 
optimum BST for each fixed memory assignment. Hence, the running time of the naive 
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algorithm is 



{^\)^ J \{^2T^{n/h){{n/h)/eY^I^))^ 

using Stirling's approximation 



V2 



nn 




V(27rn)('^-i)/2 ■ 

/J^n+h/2 . ^3-(h-l)/2 

"^ V (27r)(^-i)/2 

= 0(/i"-n=^). (3.3) 

Unfortunately, the above algorithm is inefficient and therefore infeasible even for small 
values of n because its running time is exponential in n. We develop much more efficient 
algorithms in the following sections. 

3.3.4 A dynamic programming algorithm: algorithm Parts 

A better algorithm uses dynamic programming to construct optimum subtrees bottom- 
up, like ALGORITHM Kl from section [271. l.li Our new algorithm, ALGORITHM PARTS, 
constructs an optimum subtree T/- for each i, j, such that I < i < j < n and for ev- 
ery memory configuration {rii, n2, . . . , rih) consisting of the j — i + 1 memory locations 
available at this stage in the computation. For each possible choice Xk for the root of 
the subtree Tjj, there are at most j — i + 2 < n + 1 different ways to partition the 
number of available locations in each oi h — 1 levels of the memory hierarchy between 
the left and right subtrees of Xk- (Since the number of memory locations assigned to any 
subtree equals the number of nodes in the subtree, we have the freedom to choose only 
the number of locations from any h — 1 levels because the number of locations from the 
remaining level is then determined.) 

We modify algorithm Kl from section [271.1.11 as follows, algorithm Kl builds 
larger and larger optimum subtrees T/- for all i, j such that 1 < i < j < n. For every 
choice of i and j , the algorithm iterates through the j — i + 1 choices for the root of the 
subtree from among {xi, Xj+i, . . ., Xj}. The left subtree of T*j with Xk at the root is a 
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BST, say T^^\ over the keys Xi through Xf^^i, and the right subtree is a BST, say T^^\ 
over the keys Xk+i through Xj. 

The subtree Tij has j — i + 1 nodes. Suppose the number of memory locations 
available for the subtree Tij from each of the memory levels is rii for I < I < h, where 
12i=i ^i = J ~ "^ + 1- There are 

(j -i + l) + h-l\ _ fj -i + h 
h-1 J ~ \ h-1 

_ ( {n + Kf-^ 

= O I Tif^^^ I since h < n 

\{h-l)\ J 

different ways to partition j — i + 1 objects into h parts without restriction, and therefore, 
at most as many different memory configurations with j — i + 1 memory locations. (There 
are likely to be far fewer different memory configurations because there are at most mi 
memory locations from the first level, at most 1712 from the second, and so on, in any 
configuration.) 

Let A be the smallest integer such that n^ > 0; in other words, the cheapest available 
memory location is from memory level A. 

For every choice of i, j, and k, there are at most min{A;— i+l, nx} < n different choices 
for the number of memory locations from level A to be assigned to the left subtree, T^^^ . 
This is because the left subtree with k — i nodes can be assigned any number from zero 
to max{A; — i, n\ — 1} locations from the first available memory level, Ai\. (Only at most 
n^ — 1 locations from Aix are available after the root x^ is stored in the cheapest available 
location.) The remaining locations from Aix available to the entire subtree are assigned 
to the right subtree, T^^K Likewise, there are at most min{A; — i + 1, nx+i + 1} < n + 1 
different choices for the number of ways to partition the available memory locations from 
the next memory level A^a+i between the left and right subtrees, and so on. In general, 
the number of memory locations from the memory level / assigned to the left subtree, 
n| , ranges from to at most n^. Correspondingly, the number of memory locations 
from the level / assigned to the right subtree nj is n; — nj . 

We modify algorithm K1 by inserting h — \ < h—1 more nested loops that iterate 
through every such way to partition the available memory locations from Aix through 
M.h-1 between the left and right subtrees of Tjj for a fixed choice of Xk as the root. 
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ALGORITHM PARTS: 

{Initialization) 
for i := to n 

Let Cq be the empty memory configuration (0,0,..., 0) 

C[i + 1,2, Co] ^ qi] 

R[i + l,i,Co] ^ Nil; 



for d := to n — 1 

{Construct optimum subtrees with d + 1 nodes.) 
for each memory configuration C of size d + 1 
for i := 1 to n — d 
i *— i + d 
C[i,j,C] ^ oo 
R[i,j,C]^NiL 
for k := i to j 

{Number of nodes in the left and right subtrees.) 
I ^ k ~ i {number of nodes in the left subtree) 

r ^ j — k {number of nodes in the right subtree) 

Call PROCEDURE Partition- Memory (figure IX^ to compute 
the optimum way to partition the available memory locations. 



Figure 3.1 algorithm Parts 
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PROCEDURE Partition-Memory: 

Let C = {ni,n2,...,nh). 

Let A be the smallest integer such that nx> 0. 

for n\ := to nx 



for nf^_i := to rih^i 



n 



(L) 

h 

(R) 



h-l (L) 



n\-n\' 



ni 



n 

'(H) 



n 



(L) 
A+1 



n 



(R) 
h-l 



n) 



(L) 

- rih-i - <_'i 



Use one cheap location for the root, i.e. 



n 



n 



(L) 

X 

(R) 



n 
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(L) 

X 

(R) 



(L) AL) 



,W\ 



LetC^ = (0,...,0,ni"\n\"+\,...,n;,"^) 



Let C^ = (0, . . . ,0, K - 1) - n\^\nA+i 



n 



(L) 
A+1' 



iL)\ 



Let T' be the tree with Xk at the root, and the left and right children 
are given by R[i, k — 1, C^] and R[k + 1, j, C^] respectively, 
i.e., T' is the tree 



T\i,k-l,C^ 




l,J,C' 



{Let c' be the cost ofT'.) 

{The root ofT' is stored in a location of cost c\.) 

c'^cx- Wij + c[i, k-1, c^] + c[k + 1, J, c:«] 

iiC'<C[t,j,C] 

R[t,j,C]^{k,C'') 

C[z,j,C]^C' 



Figure 3.2 procedure Partition- Memory 
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Just like ALGORITHM Kl, ALGORITHM PARTS of figure [3?T] constructs arrays R and 
C, each indexed by the pair i, j, such that 1 <i < j <n, and the memory configuration 
C specifying the numbers of memory locations from each of the h levels available to the 
subtree Tjj. Let C = {ni,n2, . . . ,nh). The array entry R[i,j,C] stores the pair {k,C^), 
where k is the index of the root of the optimum subtree T*. for memory configuration 
C, and C^ is the optimum memory configuration for the left subtree. In other words, C^ 
specifies for each / the number of memory locations n\ out of the total rii locations from 
level / available to the subtree Tj ,,• that are assigned to the left subtree. The memory 
configuration C^ of the right subtree is automatically determined: the number of memory 
locations ra| from level / that are assigned to the right subtree is n; — raj , except that 
one location from the cheapest memory level available is consumed by the root. 

The structure of the optimum BST and the optimum memory assignment function is 
stored implicitly in the array R. Let T[i,j,C] denote the implicit representation of the 
optimum BST over the subset of keys from Xj through Xj for memory configuration C. If 
R[l,n,C] = {k,C'), then the root of the entire tree is Xk and it is stored in the cheapest 
available memory location of cost c\. The left subtree is over the subset of keys xi through 
Xk-i, and the memory configuration for the left subtree is C = (0, . . . , 0, n'^, n\^^, . . . , n'^). 
The right subtree is over the subset of keys Xk+i through Xn, and the memory configura- 
tion for the right subtree is (0, . . . , 0, (n^ — 1) — n\, nx+i — JT-a+I' ■ ■ ■ y^h — "^/i)- 

In ALGORITHM PARTS, there are 3 + {h — 1) = h + 2 nested loops each of which 
iterates at most n times, in addition to the loop that iterates over all possible memory 
configurations of size d + 1 for < d < n — 1. Hence, the running time of the algorithm 
is 

0(i^"'--'^^)-0(^.„-). (3.4) 

3.3.5 Another dynamic programming algorithm: algorithm Trunks 

In this subsection, we develop another algorithm that iteratively constructs optimum 
subtrees T/ • over larger and larger subsets of keys. Fix an i and j with 1 < i < j < n 
and j — i = d, and a memory configuration Cg+i = {ni,n2, . . . , rih-i, Uh) consisting of 
s + 1 memory locations from the first h — 1 levels of the memory hierarchy and none 
from the last level, i.e., ni + n2 + ■ ■ ■ + Uh-i = s + 1 and rih = 0. At iteration s + 1, 
we require an optimum subtree, over the subset of keys from Xi through Xj, with s of its 
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nodes assigned to memory locations from the first h — 1 levels of the memory hierarchy 
and the remaining {j — i + 1) — s nodes stored in the most expensive locations. Call the 
subtree induced by the nodes stored in the first h — 1 memory levels the trunk (short for 
"truncated") of the tree. (Lemma [7] guarantees that the trunk will also be a tree, and the 
root of the entire tree is also the root of the trunk. So, in fact, a trunk with s + 1 nodes of 
a tree is obtained by pruning the tree down to s + 1 nodes by recursively deleting leaves.) 
We require the optimum subtree Tj*„ with X]r=i "^f- = ''^ ~ "^/i nodes in the trunk, all of 
which are assigned to the n — rrih locations in the cheapest h — 1 memory levels. Recall 
that nil is the number of memory locations in memory level / for 1 < I < h. 

ALGORITHM TRUNKS in figure [3^ constructs a table indexed by i, j, and Cs+i- There 
are (2) different choices of i and j such that 1 < i < j < n. Also, there are 

[s + 1) + {h - 1) ~ 1\ _ /s + h - 1 
h-2 J ~ \ h~2 

different ways to partition s + 1 objects into /i— 1 parts without restriction, and therefore, 
at most as many different memory configurations with s + 1 memory locations from the 
first h — 1 memory levels. (As mentioned earlier, there are likely to be far fewer different 
memory configurations because there are restrictions on the number of memory locations 
from each level in any configuration.) 

For every value of k from i to j and every t from to s, we construct a subtree with 
Xk at the root and t nodes in the trunk of the left subtree (the left trunk) and s — t nodes 
in the trunk of the right subtree (the right trunk). 

By Lemma U\ the root of the subtree Xk is always stored in the cheapest available 
memory location. There are at most (^) ways to select t out of the remaining s memory 
locations to assign to the left trunk. (In fact, since the s memory locations are not 
necessarily all distinct, there are likely to be far fewer ways to do this.) As t iterates 
from through s, the total number of ways to partition the available s memory locations 
and assign them to the left and right trunks is at most 

t(: 

t=0 ^ 

When all the nodes of the subtree are stored in memory locations in level h (the base 
case when s = 0), an optimum subtree T*- is one constructed by algorithm K2 from 
section [2. l.l.li Therefore, in an initial phase, we execute algorithm K2 to construct. 
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ALGORITHM TRUNKS: 

Initially, the optimum subtree T*j is unknown for all i, j, 

except when the subtree fits entirely in memory level M^, 

in which case the optimum subtree is the one 

computed by algorithm K2 during the initialization phase. 

for d := to n — 1 

for i := 1 to n — d 
j ^ i -\- d 

{Construct an optimum BST over the subset of keys from Xi through Xj.) 
for k := i to j 

{Choose Xk to be the root of this subtree.) 

for s := 1 to n — mh — 1 

{Construct a BST with s nodes in its trunk.) 

For every memory configuration Cs of size s 

for t := to s 

{The left trunk has t nodes.) 

For every choice of t out of the s memory locations 

in Cs to assign to the left subtree. 

Let T' be the BST over the subset of keys from Xi through Xj 

with Xk at the root, 

t nodes in the trunk of the left subtree, and 

s — t nodes in the trunk of the right subtree. 

The left subtree of T' is the previously computed 
optimum subtree over the keys Xi through Xk-i 
with t nodes in its trunk, and the right subtree of T' 
is the previously computed optimum subtree over the 
keys Xfc+i through Xj with s — t nodes in its trunk. 

If the cost of T' is less than that of the minimum-cost 
subtree found so far, then record T' as the new 
optimum subtree. 



Figure 3.3 algorithm Trunks 



39 



in O(n^) time, all optimum subtrees T*- that fit entirely within one memory level, in 
particular, the last and most expensive memory level. 

The total running time of the dynamic programming algorithm is, therefore, 

(n— 1 n—d i+d n—mi^ — l , , 

d=0 4=1 k=i s=0 ^ 

Let 

By definition. 



s=0 ^ 



n-mh-1 , ^\h~2 i n-mfe-1 

^^ '- ^ (h-2)\ ih-2)\ ^ ^ ' 

Thus, fin) is bounded above by the sum of a geometric series whose ratio is at most 
1 ■ in — rrih — 1 + h — 1). Hence, we have 

1 2"-'"'^ {n-mh + h- 2)"-™'^ - 1 



{h-2)\ 2{n-mh + h-2)-l 



/(n)< 



Therefore, the running time of the algorithm is 

l^^ ^ {h-2)\ 

\d=0 j=l fc=j ^ ' 



^|2^-M^>-m^+ftrzi^^(,+i) 



d=0 J=l 



f2''-""^ ■{n-rrih + h)''-'^^ ■n^\ , ^ 

= ^ [ {h^l J • ('•') 

ALGORITHM TRUNKS is efficient when n — rrth and h are both small. For instance, 
consider a memory organization in which the memory cost function grows as the tower 
function defined by: 

tower(O) = 1 
tower(i + 1) = 2*°"'=''(^) = 2^' [ (i + 1 times) for all i > 1. 

If yu(a) = tower(a) is the memory cost function, then X]r=i ^^r = n—rrih < Ig ( ^^=1 m^ ] = 
Igra, and h = log* n. For all practical purposes, log* n is a small constant; therefore, the 
running time bound of equation 13.51 is almost a polynomial in n. 
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3.3.6 A top-down algorithm: algorithm Split 

Suppose there are n distinct memory costs, or n levels in the memory hierarchy with 
one location in each level. A top-down recursive algorithm to construct an optimum BST 
has to decide at each step in the recursion how to partition the available memory locations 
between the left and right subtrees. Note that the number of memory locations assigned 
to the left subtree determines the number of keys in the left subtree, and therefore 
identifies the root. So, for example, if k of the available n memory locations are assigned 
to the left subtree, then there are k keys in the left subtree, and hence, the root of the 
tree is x^+i- 

At the top level, the root is assigned the cheapest memory location. Each of the 
remaining n — 1 memory locations can be assigned to either the left or the right subtree, 
so that k of the n — 1 locations are assigned to the left subtree and n — 1 — k locations 
to the right subtree for every k such that < A; < n — 1. Thus, there are 2"~^ different 
ways to partition the available n — 1 memory locations between the two subtrees of the 
root. The algorithm proceeds recursively to compute the left and right subtrees. 

The asymptotic running time of the above algorithm is given by the recurrence 

T(n) = 2"-^+ max {Tik) +T{n -I - k)} . 

0<k<n-l 

Now, T{n) is at least 2""-*^, which is a convex function, and T{n) is a monotonically 
increasing function of n. Therefore, a simple inductive argument shows that T{n) itself 
is convex, so that it achieves the maximum value at either A; = OorA; = n — 1. AtA; = 0, 
T{n) = 7P-~^ + T(0) + Tin — 1) which is the same value as at A; = n — 1. Therefore, 

T[n) <2"-i+T(0)+T(n-l) 



j=0 



2"-l 

0(2"). (3.6) 



3.4 Optimum BSTs on the HMM2 model 

In this section, we consider the problem of constructing and storing an optimum BST 
on the HMM2 model. Recall that the HMM2 model consists of ra\ locations in memory 
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level A^i, each of cost Ci, and m2 locations in memory level A^25 each of cost C2, with 

Ci < C2. 

3.4.1 A dynamic programming algorithm 

In this section, we develop a hybrid dynamic programming algorithm to construct 
an optimum BST. Recall that algorithm K2 of section 12.1.11 constructs an optimum 
BST for the uniform-cost RAM model in Oln^) time. It is an easy observation that the 
structure of an optimum subtree that fits entirely in one memory level is the same as 
that of the optimum subtree on the uniform-cost RAM model. Therefore, in an initial 
phase of our hybrid algorithm, we construct optimum subtrees with at most maxmi, m2 
nodes that fit in the largest memory level. In phase II, we construct larger subtrees. 

Recall from equation (12.11) that on the uniform-cost RAM model the cost c{i,j) of an 
optimum BST over the subset of keys from Xi through Xj is given by the recurrence 

c(i + l,i) =Wi+i^i = qi 

c{i,j) = Wij + min {c{i, A; — 1) + c{k + 1, j)) when i < j 

i<k<j 

On the HMM2 model, the cost of an optimum BST T/ • over the same subset of keys 
is 

c{i + l,i,ni,n2) = qt 

c{i,j,ni,n2) = fi{(f){xk)) ■ Wij 

+ min (c{i,k-l,n[^\ni^^) + c{k + l,j,nf\n[^^)) (3.7) 

i<k<j \ / 

where 

• the root x^ is stored in memory location 0(xfc) of cost /i(0(xfc)); 

• out of the Hi cheap locations available to the subtree, n\ are given to the left 
subtree and n\ are given to the right subtree; 

• the n2 expensive locations available are assigned as rig to the left subtree and rv^ 



to the right subtree; 



iL) , AR) 



if Til > 0, then Xk is stored in a location of cost Ci, and n\ + n\ = rii — 1 and 

(L) , (R) 

7X2 +n2 = n2; 
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• otherwise, ni = and n2 = j — i + 1, so x^ is stored in a location of cost C2, and 
the entire subtree is stored in the second memory level; the optimum subtree T*, 
is the same as the optimum one on the RAM model constructed during phase I. 

The first phase of the algorithm, procedure TL-phase-I constructs arrays C and 
i?, where C[z,j] is the cost of an optimum BST (on the uniform-cost model) over the 
subset of keys from Xi through Xj\ R[i-,j] is the index of the root of such an optimum 
BST. 

The second phase, procedure TL-phase-II, constructs arrays c and r, such that 
c[i, j, rii, n2] is the cost of an optimum BST over the subset of keys from Xi through Xj with 
rii and 122 available memory locations of cost ci and C2 respectively, and ni + n2 = j—i + l; 
r[z, j, rii, 77-2] is the index of the root of such an optimum BST. 

The structure of the tree can be retrieved in 0{n) time from the array r at the end 
of the execution of algorithm TwoLevel. 

3.4.1.1 algorithm TwoLevel 

ALGORITHM TwoLevel first calls PROCEDURE TL-PHASE-I. Recall that proce- 
dure TL-PHASE-I constructs all subtrees Tjj that contain few enough nodes to fit 
entirely in any one level in the memory hierarchy, specifically the largest level. Entries 
in table R[i-,j] are filled by procedure TL-phase-I. 

PROCEDURE TL-PHASE-II computes optimum subtrees where rii and n2 are greater 
than zero. Therefore, prior to invoking algorithm TL-PHASE-II, ALGORITHM TwoLevel 
initializes the entries in table r[z, j, rii, -0.2] when ni = and when 77-2 = from the entries 
in table -R[i, j]. 

3.4.1.2 Procedure TL-phase-I 

PROCEDURE TL-PHASE-I is identical to algorithm K2 from section [271. 1.11 except 
that the outermost loop involving d iterates only max{r7ii, 777,2} times in PROCEDURE TL- 
PHASE-I. PROCEDURE TL-PHASE-I computes optimum subtrees in a bottom-up fashion. 
It fills entries in the tables C[z, j] and R[iij] by diagonals, i.e., in the order of increasing 
d = j — i. The size of the largest subtree that fits entirely in one memory level is max{777i, 
7712}, corresponding to ci = max{777i, 7772} — 1- 
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ALGORITHM TwoLeVEL: 




Call PROCEDURE TL-PHASE-1 f figure 13.5) 


If either mi = or m2 = 0, 


then we are done. 


Otherwise, 




Initialize, for all i, j such that 1 < i < j < n: 


r[i,j,0,j -i + l 


^R[^,J] 


ri,j,j -i+ 1,0 


^ RiJ 


c^,i,0,j-^ + 1] < 


-C2-C[i,j] 


c[ij,j -i + l,0] < 


-Ci-C[i,j] 


Call PROCEDURE TL- 


PHASE-il (hgure|3.fiP 



Figure 3.4 algorithm TwoLevel 



For every i,j with j — i = d, TL-PHASE-I computes the cost of a subtree T' with 
Xk at the root for all k, such that R[i,j — 1] < A; < R[i + 1, j]. Note that (j — 1) — i = 
j — {i + l) = d — 1; therefore, entries R[i, j — 1] and R[i + 1, j] are already available during 
this iteration of the outermost loop. The optimum choice for the root of this subtree is 
the value of k for which the cost of the subtree is minimized. 

3.4.1.3 Procedure TL-phase-II 

PROCEDURE TL-PHASE-II is an implementation of algorithm Parts in section 
13.3.41 for the special case when h = 2. procedure TL-phase-II also constructs in- 
creasingly larger optimum subtrees in an iterative fashion. The additional complexity in 
this algorithm arises from the fact that for each possible choice of root Xk of the subtree 
Tij, there are also a number of different ways to partition the available cheap locations 
between the left and right subtrees of Xk- 

There are mi cheap locations and m2 expensive locations available to store the sub- 
tree Tj j. If nil > 1, then the root Xk is stored in a cheap location. The remaining 
cheap locations are partitioned into two, with n[ locations assigned to the left subtree 
and nl locations assigned to the right subtree, ng and ng denote the number of 
expensive locations available to the left and right subtrees respectively. Since the al- 
gorithm constructs optimum subtrees in increasing order of j — i, the two table entries 
r[i,k — l,ni ,n2 ] and r[k + l,j,nl ,n2 ] are already available during the iteration 
when j — i = d because {k — 1) — i < d and j — (A; -|- 1) < d. 
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PROCEDURE TL-PHASE-I: 


{Initialization phase.) 


for i := to n 


C[i + l,i] ^Wi+i^i = Qi 


R[i + l,i] <— Nil 


for d := to max{mi, 1712} — 1 


ioT i := 1 to n — d 


j ^i + d 


(Number of nodes in this subtree: j— i + l = d+l.) 


C[i,j] ^ 00 


Ri,j ^ Nil 


for k := 


R[i,j - 1] to R[i + l,j] 


(*) 


T' is the tree 


iSl 




T[i, k-l]\ 


T[k + l,j]\ 


c * 


- Wij + C[i, k - 1] + C[k + l,j] 


iiC'<Ct,j 


R[i,j] ^— k 


C[t,j]^C' 



Figure 3.5 procedure TL-phase-I 
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PROCEDURE TL-PHASE-II: 


for d := min{mi, 1712} to n — 1 
for 111 := to mm{mi, d+ 1} 
n2 ^ {d+ 1) -ni 
for i := 1 to n — d 


J ^i + d 


c[i,j,ni,n2] ^ 00 
r[i,j,ni,n2\ ^- Nil 
for k := i to j 

{Number of nodes in the left and right subtrees.) 

l^k-1 


r *— n — k 


if ni > 1 

Use one cheap location for the root; 
{Now, there are only ni — 1 cheap locations available.) 
for n\ := max{0, (ni — 1) — r} to niin{/, {rii — 1)} 
n(^) - / - nS^) 


(^) 


T'^ 


^2^ 




T[i,k-l,n[^\ni^^]\ 


T[A; + l,j,nfUf^]| 






ifc' 


- ci ■ Wij + c[i,k - l,n\ ',112 \ + c[«; + i-,J,n\ ' ,n\ '\ 
< c[i,j,ni,n2\ 


r[i,j,ni,n2] ^ A; 


c[i,j,ni,n2] ^ c' 



Figure 3.6 procedure TL-phase-II 
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3.4.1.4 Correctness of algorithm TwoLevel 

ALGORITHM TWOLEVEL calls PROCEDURE TL-PHASE-I and PROCEDURE TL-PHASE- 
II, which implement dynamic programming to build larger and larger subtrees of min- 
imum cost. The principle of optimality clearly applies to the problem of constructing 
an optimum tree — every subtree of an optimal tree must also be optimal given the same 
number of memory locations of each kind. Therefore, ALGORITHM TwoLevel correctly 
computes an optimum BST over the entire set of keys. 

3.4.1.5 Running time of algorithm TwoLevel 

The running time of ALGORITHM TwoLevel is proportional to the number of times 
overall that the lines marked with a star {-k) in TL-PHASE-I and TL-PHASE-II are 
executed. 

Let m = min{mi, 1712} be the size of the smaller of the two memory levels. The 
number of times that the line in algorithm TL-PHASE-I marked with a star {-k) is executed 
is 

n—m n—d n—m 

^^{R[i + I, ]] - R[i, i - I] + I) = '^{R[n-d+l,n+l] - R[l,d - 1] + n - d) 

d=0 1=1 d=0 

n—m 

= 2n{n — m + 1) 
= 0{n{n — m)). 

The number of times that the line (•) in procedure TL-phase-II is executed is at 
most 

n-l min{mi,d+l} n-d i+d 
d=m ni=0 i=l k=i 

A simple calculation shows that the two summations involving d and i iterate 0{n — m) 
times each, the summation over rii iterates 0{n) times, and the innermost summation has 
0{n) terms, so that the number of times that the starred line is executed is 0{mn^{n — 
mY). 

Therefore, the total running time of ALGORITHM TwoLevel is 

T(n, m) = 0{n{n — m) + mn^{n — m)^) = Oimn^in — m)^). (3.8) 
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In general, T{n,m) = 0{n^), but T{n,m) = o{n^) ii m = o{n), and T{n,m) = O(n^) 
if m = 0(1), i.e., the smaller level in memory has only a constant number of memory 
locations. This case would arise in architectures in which the faster memory, such as the 
primary cache, is limited in size due to practical considerations such as monetary cost 
and the cost of cache coherence protocols. 

3.4.2 Constructing a nearly optimum BST 

In this section, we consider the problem of constructing a BST on the HMM2 model 
that is close to optimum. 

3.4.2.1 An approximation algorithm 

The following top-down recursive algorithm, algorithm Approx-BST of figures 
13.71 and 13.81 is due to Mehlhorn |Meh84] . Its analysis is adapted from the same source. 
The intuition behind algorithm Approx-BST is to choose the root Xk of the subtree 
Tij so that the weights Wi^k~i and Wk+ij of the left and right subtrees are as close to equal 
as possible. In other words, we choose the key Xk to be the root such that \wi^k-i — Wk+ijl 
is as small as possible. Then, we recursively construct the left and right subtrees. 

Once the tree T has been constructed by the above heuristic, we optimally assign the 
nodes of T to memory locations using Lemma [H] in 0{nlogn) additional time. 

Algorithm Approx-BST implements the above heuristic. The parameter I represents 
the depth of the recursion; initially / = 0, and / is incremented by one whenever the 
algorithm recursively calls itself. The parameters low; and high; represent the lower 
and upper bounds on the range of the probability distribution spanned by the keys x, 
through Xj. Initially, low; = and high; = 1 because the keys Xi through x„ span the 
entire range [0, 1]. Whenever the root Xk is chosen, according to the above heuristic, to 
lie in the middle of this range, i.e., midi = (low; + high;)/2, the span of the keys in the 
left subtree is bounded by [low;,med;] and the span of the keys in the right subtree is 
bounded by [medi, highj. These are the ranges passed as parameters to the two recursive 
calls of the algorithm. 
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Define 





'' = J 




, Qi-i , , Qi 
Si = Si_i + 2 +Pi+ 2 


By definition, 






i i-1 








k=i fc=i 






Therefore, 




Sj- 


Si-i = Wij Wi^i-i + 




-<"--¥-! 



ioT l<i<n (3.9) 



(3.10) 



by definition O (3.11) 

In Lemma [I3] below, we sliow tliat at eacli level in tlie recursion, tlie input parameters 
to Approx-BST() satisfy low^ < Si-i < Sj < liigfi/. 

3.4.2.2 Analysis of the running time 

We prove that the running time of algorithm APPROX-BST is 0{n). Clearly, the 
space complexity is also linear. 

The running time t{n) of ALGORITHM AppROX-BST can be expressed by the recur- 
rence 

t{n) = s{n) + max [t{k - 1) + t{n - A;)] (3.12) 

l<fc<71 

where s{n) is the time to compute the index k satisfying conditions (i), (ii), and (iii) 
given in the algorithm, and t{k — 1) and t{n — k) are the times for the two recursive calls. 
We can implement the search for A; as a binary search. Initially, choose r = [_{i+j)/2\ . 
If Sr > medi, then k < r, otherwise k > r, and we proceed recursively. Since this 
binary search takes 0(log(j — i)) = 0{logn) time, the overall running time of algorithm 
Approx-BST is 

t{n) = O(logra) + max [t{k - 1) + t{n - k)] 

l<k<n 

< O{logn)+t{0)+t{n- 1) 
= O(nlogn). 
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Approx-BST(z, j, I, low;, high;): 

medi ^ {\owi + high;)/2; 

Case 1: (the base case) 

if z = j 

Return the tree with three nodes consisting of Xi at the root 

and the external nodes Zi^i and Zj as the left and right subtrees respectively: 




Otherwise, if z 7^ j, then find k satisfying all the following three conditions: 

ii)t<k<j 

(ii) either k = i, or k > i and s^-i < medi 

(iii) either k = j, ot k < j and Sk > medi 
(Lemma [TT] guarantees that such a k always exists.) 

(Continued in Ggure IXSl) 



Figure 3.7 algorithm Approx-BST 
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(Continued from figure [377|) 



Case 2a: 

a k = i 

Return the tree with Xi at the root, the external node Zj_i as the left subtree, 
and the recursively constructed subtree Tj+ij as the right subtree: 

Xi 




Approx-BST(2 + 1, j, / + 1, medi, highi) I 



Case 2b: 

iik = j 

Return the tree with Xj at the root, the external node Zj as the right subtree, 
and the recursively constructed subtree Tij_i as the left subtree: 



AppROX-BST(i,j - 1,/ + l,low, 




Case 2c: 

ii i < k < j 

Return the tree with Xk at the root, 

and recursively construct the left and right subtrees, 

Ti^k-i and Tk+ij respectively: 

call AppROX-BST(i, fc — 1, / + 1, low;, medi) recursively 

to construct the left subtree. 
call AppROX-BST(fc + 1, j, / + 1, medi, high,) recursively 
to construct the right subtree. 



Figure 3.8 algorithm Apprqx-BST (cont'd.) 
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However, if we use exponential search and then binary search to determine the value 
of k, then the overall running time can be reduced to 0{n) as follows. Intuitively, an 
exponential search followed by a binary search finds the correct value of k in 0{log{k — i)) 
time instead of 0(log(j — i)) time. 

Initially, choose r = \_{i + j)/2\. Now, if Sr > medi we know k <r, otherwise k > r. 

Consider the case when k E {i, i + I, i + 2, . . ., r = [{i + j)/2j}. An exponential 
search for k in this interval proceeds by trying all values of k from i, i + 2^, i + 2^, i + 2^, 
and so on up to i + 2^^^^^~'^'>~^ > r. Let g be the smallest integer such that Sj+29 > medi, 
i.e., i + 2»-i < k <i + 29, 01 29 > k-i > 29-\ Hence, lg(A; - i) > g - 1, so that the 
number of comparisons made by this exponential search is g < 1 + \g{k — i). Now, we 
determine the exact value of fc by a binary search on the interval i + 2^~^ + 1 through 
i + 2^, which takes lg(2^ — 2^"-*^)) + 1 < g + 1 < \g{k — i) +2 comparisons. 

Likewise, when k G {r+1, r+2, . . ., j}, a search for k in this interval using exponential 
and then binary search takes lg(j — k) + 2 comparisons. 

Therefore, the time s{n) taken to determine the value of k is at most (i(2 + lg(min{/c — 
i, j — k})), where d is a constant. 

Hence, the running time of algorithm APPROX-BST is proportional to 

t{n) = max {t{k - 1) + t{n - k) + d{2 + Igminjfc, n - fc}) + /) 

l<k<n 

where / is a constant. By the symmetry of the expression t{k — 1) + t{n — /c), we have 
tin)< max itik - I) +tin - k) + di2 + \gk) + f) . (3.13) 

l<fc<(n+l)/2 

We prove that t{n) < (3(i + f)n — d\g{n + 1) by induction on n. This is clearly true 
for n = 0. Applying the induction hypothesis in the recurrence in equation (I3.13p . we 
have 

tin)< max iM + f)ik - I) - dlgk + iM + f){n - k) 

l<k<{n+l)/2 

- d \g{n -k + l) + d{2 + \gk) + /) 
= (3rf + /)(n- 1)+ max i-d\g{n~k + l) + 2d + f) 

l<fc<(n+l)/2 

= {M+f)n+ max i-dlgin - k + I) - d) . 

l<fc<(n+l)/2 
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The expression —d{l + lg{n — A; + 1)) is always negative and its value is maximum in the 
range 1 < A; < (n + l)/2 at A; = (n + l)/2. Therefore, 

t{n) < {3d + f)n - d{l + lg((n + l)/2)) 
= {3d + f)n-dlg{n + l). 

Hence, the running time of algorithm APPROX-BST is 0{t{n)) = 0{n). 
Of course, if we choose to construct an optimal memory assignment for T, then the 
total running time is 0{n + nlogn) = 0{nlogn). 

3.4.2.3 Quality of approximation 

Let T denote the binary search tree constructed by algorithm AppROX-BST. In the 
rest of this section, we prove an upper bound on how much the cost of T is worse than the 
cost of an optimum EST. The following analysis applies whether we choose to construct 
an optimal memory assignment or to use the heuristic of algorithm ApPROX-BST. 

We now derive an upper bound on the cost of the tree, T, constructed by algorithm 
Approx-BST. 

Let 6{xi) denote the depth of the internal node Xi, 1 < i < n, and let S{zj) denote 
the depth of the external node Zj, < j < n in T. (Recall that the depth of a node is 
the number of nodes on the path from the root to that node; the depth of the root is 1.) 

Lemma 11. If the parameters i , j, lowi, and highi to AppROX-BST() satisfy 

lowi < Si^i < Sj < highi, 

then a k satisfying conditions (i), (ii), and (Hi) stated in the algorithm always exists. 

Proof: If Si > med;, then choosing k = i satisfies conditions (i), (ii), and (iii). Likewise, 
if Sj_i < med/, then k = j satisfies all the conditions. Otherwise, if Sj < med/ < Sj-i, 
then since Si < Sj+i < ■ ■ ■ < Sj-i < Sj, consider the first k, with k > i, such that 
Sk-i < med; and sj. > med^. Then k < j and sj. > med;, and this value of k satisfies all 
three conditions. D 

Lemma 12. The parameters of a call to Approx-BST satisfy 

highi = lowi + 2^K 
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Proof: The proof is by induction on I. The initial call to Approx-BST with I = has 
low; = and high; = 1. Whenever the algorithm recursively constructs the left subtree 
Tj^fc_i in cases 2b and 2c, we have low/+i = low; and high;_,_]^ = med^ = (low; + high;)/2 = 
(21ow; + 2~')/2 = low; + 2~'-~^ = low;+i + 2^*^'+^). On the other hand, whenever the 
algorithm recursively constructs the right subtree T^+i j, in cases 2a and 2c, we have 
high;_(_j^ = high; and low;_|_i = med; = high;^^ — 2^*^'+^^. D 

Lemma 13. The parameters of a call AppROX-BST(i, j, /, lowi, highi) satisfy 

lowi < Sj_i < Sj < highi. 

Proof: The initial call is AppROX-BST(l,n, 1,0, 1). Therefore, Sj_i = Sq = qo > and 
Sj = Sn = 1 — go/S — 5'n/2 < 1. Thus, the parameters to the initial call to Approx-BST() 
satisfy the given condition. 

The rest of the proof follows by induction on /. In case 2a, the algorithm chooses k = i 
because Sj > med;, and recursively constructs the right subtree over the subset of keys 
from Xj+i through Xj. Therefore, we have low;+i = med; < Sj < Sj < high; = high;_,_]^. 

In case 2b, the algorithm chooses k = j because Sj_i < med;, and then recursively 
constructs the left subtree over the subset of keys from Xi through Xj-i. Therefore, we 
have low;+i = low; < Sj__i < Sj-i < med; = high;_,_^. 

In case 2c, algorithm Approx-BST chooses k such that s^-i < med; < Sk and 
i < k < j. Therefore, during the recursive call to construct the left subtree over the subset 
of keys from Xi through Xk~i, we have low;+i = low; < Sj_i < Sk-i < med; = high;_,_^. 
During the recursive call to construct the right subtree over the subset of keys from Xk+i 
through Xj, we have low;+i = med; < Sk < Sj < high; = high;_,_]^. D 

Lemma 14. During a call to Approx-BST with parameter I, if an internal node Xk is 
created, then 6{xk) = 1 + 1, and if an external node z^ is created, then 5{zk) = 1 + 2. 

Proof: The proof is by a simple induction on /. The root, at depth 1, is created when 
/ = 0. The recursive calls to construct the left and right subtrees are made with the 
parameter / incremented by 1. The depth of the external node created in cases 2a and 
2b is one more than the depth of its parent, and therefore equal to / + 2. D 
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Lemma 15. For every internal node Xk such that 1 < k < n, 

Pk < 2-'^(^^)+^ 
and for every external node Zk such that < k < n, 

Proof: Let the internal node Xk be created during a call to AppROX-BST(i, j, low;, high;) 
Then, 



Sj - Si_i < high; - lowi 
qj 

Sj - Si_i = Wi, - — - Wl,i-1 + 



qi-i 



>Pk 



by Lemma 
by Lemma 
by definition of Sj_i and Sj 

because i < k < j. 



by definition I3.1UI 



Therefore, by Lemmas [T3l and [T2l for the internal node Xk {i < k < j) with probability 

Pk, we have pk < Sj — Sj_i < 2~' = 2"'^*^^''=)"'"^ by Lemma UM 

Likewise, for the external node z^ {i — 1 < k < j) with corresponding probability of 

access q^, we have 

j i-i 

Pr+ 2^ Qr + j-^ 

r=i r=i—l 

El ^«-i I ^[^ I ^J 

r=i r=i 

Therefore, since i — 1 < A; < j, we have 

Qk < 2(sj - Si_i) 

< 2 (high; — low/) by Lemma 

= 2~'^^ by Lemma 

= 2-'^(^'=)+2 by Lemma [I 



D 



Lemma 16. For every internal node Xk such that 1 < k < n, 



S(xk) < 



lff( - 

Pk 



+ 1 
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and for every external node z^ such that < k < n, 



S{zk) < 



lff(- 

Qk 



+ 2. 



Proof: Lemma [T5l shows that pk < 2^^'^^'-'^+^. Taking logarithms of both sides to the 
base 2, we have IgPk < —S{xk) + 1; therefore, S{xk) < — IgPfc + 1 = lg(l/pA;) + 1- Since 
the depth of Xk is an integer, we conclude that S{xk) < [lg(l/pfc)J + 1- Likewise, for 
external node Zk, S{zk) < [\g{l/qk)\ +2. D 

Now we derive an upper bound on cost(T). Let H denote the entropy of the proba- 
bility distribution go, Pi, li, ■ ■ ■, Pn, Qn [CT91J. i.e., 

n ^ n _ 

^ = EP^lg- + E^^lg-- (3.14) 

If all the internal nodes of T were stored in the expensive locations, then the cost of 
T would be at most 

n n 

Y^ C2PiS{Xi) + Y^ C2qj{S{Zj) - 1) 
i=l j=0 



<C2 



^|:,(,i,.),g,(,i,.)) 



by Lemma 

\ \ i=l ^* j=0 ^^ / \ i=l j=0 

= C2{H+1) 

n n 

by definition 13.141 and because /^Pi + /^Qj = 1- (3.15) 

i=l j=0 

3.4.2.4 Lower bounds 

The following lower bounds are known for the cost of an optimum binary search tree 
T* on the standard uniform-cost RAM model. 

Theorem 17 (Mehlhorn [Meh75p . 

cost{T*) > -^ 
lg3 
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Theorem 18 (De Prisco, De Santis [dPdS96]). 

costiT*) >H-l- \y,Pi\ aglg(^+ 1) - !)• 

Theorem 19 (De Prisco, De Santis [dPdS96p . 

costiT*) > H + H\g H - {H + l)\g{H + 1). 

The lower bounds of Theorems [17] and [19] are expressed only in terms of H, the 
entropy of the probability distribution. The smaller the entropy, the tighter the bound 
of Theorem [T71 Theorem [T9] improves on Mehlhorn's lower bound for H ^ 15. Theorem 
[T8] assumes knowledge of n, and proves a lower bound better than that of Theorem [17] 
for large enough values of H. 

3.4.2.5 Approximation bound 

Corollary 20. The algorithm APPROX-BST constructs the tree T such that 

cost{f) - cost{T*) < (C2 - Ci)H + ci{{H + 1) lg{H + l)-HlgH) + c^. 

Proof: Theorem [T9] immediately implies a lower bound oi Ci{H + H Ig H — {H + 1) \g{H + 
1)) on the cost of T*. The result then follows from equation (I3.15p . D 

For large enough values oi H, H + 1 ^ H so that \g(H + 1) ^ Ig-f^; hence, {H + 
1) \g{H +l)-H\gH ^\gH. Thus, we have 

cost(t) - cost(T*) < (c2 - ci)H + Ci{\gH). (3.16) 

When Ci = C2 = 1 as in the uniform-cost RAM model, equation (I3.16P is the same as the 
approximation bound obtained by Mehlhorn |Meh84j . 
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CHAPTER 4 

Conclusions and Open Problems 



4.1 Conclusions 

The table of figure 14.11 summarizes our results for the problem of constructing an 
optimum binary search tree over a set of n keys and the corresponding probabilities of 
access, on the general HMM model with an arbitrary number of levels in the memory 
hierarchy and on the two-level HMM2 model. Recall that h is the number of memory 
levels, and mi is the number of memory locations in level / for 1 < I < h. 

We see from table S]T] that algorithm Parts is efficient when h is a small constant. 
The running time of ALGORITHM PARTS is independent of the sizes of the different mem- 
ory levels. On the other hand, the running time of algorithm Trunks is polynomial 
in n precisely when n — rrih = Xlj^i ^^i is a constant, even if h is large. Therefore, 
for instance, algorithm Parts would be appropriate for a three-level memory hierar- 
chy, where the binary search tree has to be stored in cache, main memory, and on disk. 
algorithm Trunks would be more efficient when the memory hierarchy consists of 
many levels and the last memory level is extremely large. This is because algorithm 
Trunks uses the speed-up technique due to Knuth |Knu71t IKnu73] and Yao jYao82] to 
take advantage of the fact that large subtrees of the BST will in fact be stored entirely 
in the last memory level. 

When h is large and n—rrih is not a constant, the relatively simple top-down algorithm, 
ALGORITHM SPLIT, is the most efficient. In particular, when h = Q{n/ logn), it is faster 
than ALGORITHM Parts. 

For the HMM2 model, we have the hybrid algorithm, algorithm TwoLevel, with 
running time 0{n{n — m) + mn'^{n — 171)"^), where m = minjmi, 7712} is the size of the 
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Figure 4.1 Summary of results 



smaller of the two memory levels (m < n/2). Procedure TL-PHASE-II of ALGORITHM 
TwoLevel is an implementation of algorithm Parts for a special case. The running 
time of ALGORITHM TwoLevel is 0{rt') in the worst case, the same as the worst-case 
running time of ALGORITHM PARTS for h = 2. However, if m = o{n), then ALGO- 
RITHM TwoLevel outperforms algorithm Parts; in particular, if m = 6(1), then 
the running time of ALGORITHM TwoLevel is 0{n^). 

None of our algorithms depend on the actual costs of accessing a memory location 
in different levels. We state as an open problem below whether it is possible to take 
advantage of knowledge of the relative costs of memory accesses to design a more efficient 
algorithm for constructing optimum BSTs. 

For the problem of approximating an optimum EST on the HMM2 model, we have 
a linear-time algorithm, algorithm Approx-BST of section 13.4. 2[ that constructs the 
tree T such that 

cost(T) - cost(T*) < (C2 - ci)H + ci{{H + 1) \g{H +l)-H\gH)+C2 

where cost(T*) is the cost of an optimum EST. 

4.2 Open problems 

4.2.1 Efficient heuristics 

We noted above that our algorithms do not assume any relationship between the costs 
Q of accessing a memory location in level I, 1 < I < h. It should be possible to design an 
algorithm, more efficient than any of the algorithms in this thesis, that takes advantage 
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of knowledge of the memory costs to construct an optimum binary search tree. The 
memory cost function fi{a) = G(loga) would be especially interesting in this context. 

4.2.2 NP-hardness 

Conjecture 21. The problem of constructing a BST of minimum cost on the HMM 
with h = Q{n) levels in the memory hierarchy is NP-hard. 

The dynamic programming algorithm, ALGORITHM PARTS, of section 13.3.41 runs in 
time 0(n'^"'"^), which is efficient only ii h = 0(1). We conjecture that when h = Q{n), 
the extra complexity of the number of different ways to store the keys in memory, in 
addition to computing the structure of an optimum BST, makes the problem hard. 

4.2.3 An algorithm efficient on the HMM 

Although we are interested in the problem of constructing a BST and storing it in 
memory such that the cost on the HMM is minimized, we analyze the running times of 
our algorithms on the RAM model. It would be interesting to analyze the pattern of 
memory accesses made by the algorithms to compute an optimum BST, and optimize 
the running time of each of the algorithms when run on the HMM model. 

4.2.4 BSTs optimum on both the RAM and the HMM 

When is the structure of the optimum BST the same on the HMM as on the RAM 
model? In other words, is it possible to characterize when the minimum-cost tree is the 
one that is optimum when the memory configuration is uniform? 

The following small example demonstrates that, in general, the structure of an opti- 
mum tree on the uniform-cost RAM model can be very different from the structure of 
an optimum tree on the HMM. To discover this example, we used a computer program 
to perform an exhaustive search. 

Consider an instance of the problem of constructing an optimum BST on the HMM2 
model, with n = 3 keys. The number of times pi that the i-th key Xi is accessed, for 
1 < i < 3, and the number of times Qj that the search argument lies between Xj and 
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Figure 4.2 An optimum BST on the unit-cost RAM model. 

Xj+i, for < J < 3, are: 

Pi = (98, 72, 95) 
Qj = (49, 20, 22, 84) 

Tlie Pi's and q/s are tlie frequencies of access. They are not normalized to add up to 1, 
but such a transformation could easily be made without changing the optimum solution. 

In this instance of the HMM model, there is one memory location each whose cost is 
in {4, 12, 14, 44, 66, 76, 82}. The optimum BST on the RAM model is shown in figure 
14. 2[ Its cost on the RAM model with each location of unit cost is 983, while the cost of 
the same tree on this instance of the HMM model is 16, 752. 

On the other hand, the BST over the same set of keys and frequencies that is optimum 
on this instance of the HMM model is shown in figure 14. 3[ Its cost on the unit-cost RAM 
model is 990 and on the above instance of the HMM model is 16, 730. In figure S31 the 
nodes of the tree are labeled with the frequency of the corresponding key, and the cost 
of the memory location where the node is stored in square brackets. 

4.2.5 A monotonicity principle 

The dynamic programming algorithms, ALGORITHM PARTS of section 13.3.41 and al- 
gorithm TwoLevel of section [3.4. 11 iterate through the large number of possible ways 
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Figure 4.3 An optimum BST on the HMM model. 



of partitioning the available memory locations between left and right subtrees. It would 
be interesting to discover a monotonicity principle, similar to the concave quadrangle 
inequality, which would reduce the number of different options tried by the algorithms. 
For the problem of constructing an optimum BST on the HMM2 model with only 
two different memory costs, we were able to disprove the following conjectures by giving 
counter-examples: 

Conjecture 22 (Disproved). If Xk is the root of an optimum subtree over the subset 
of keys Xi through Xj in which m cheap locations are assigned to the left subtree, then the 
root of an optimum subtree over the same subset of keys in which m + 1 cheap locations 
are assigned to the left subtree must have index no smaller than k. 

Counter-example: Consider an instance of the problem of constructing an optimum 
BST on the HMM2 model, with n = 7 keys. In this instance, there are mi = 5 cheap 
memory locations such that a single access to a cheap location costs Ci = 5, and m2 = 10 
expensive locations such that a single access to an expensive location has cost C2 = 15. 
The number of times Pi that the i-th key Xi is accessed, for 1 < i < 7, and the number 
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of times qj that the search argument hes between Xj and x_,+i, for < j < 7, are: 

Pi = (2,2,2,10,4,9,5) 
g, = (6,6,7,4,1,1,9,6) 

The Pi's and g/s are the frequencies of access; they could easily be normalized to add up 
to 1. 

An exhaustive search shows that the optimum BST with n^ = cheap locations 
assigned to the left subtree (and therefore, 4 cheap locations assigned to the right subtree), 
with total cost 1, 890, has 0:3 at the root. The optimum BST with n\ = 1 cheap locations 
assigned to the left subtree (and 3 cheap locations assigned to the right subtree), with 
total cost 1, 770, has X2 at the root. This example disproves conjecture] 



Conjecture 23 (Disproved). li Xk is the root of an optimum subtree over the subset 
of keys Xi through Xj in which m cheap locations are assigned to the left subtree, then 
in the optimum subtree over the same subset of keys but with Xk+i at the root, the left 
subtree must have assigned no fewer than m cheap locations. 

Counter-example: Consider an instance of the problem again with n = 7 keys. In 
this instance, there are mi = 5 cheap memory locations such that a single access to a 
cheap location costs ci = 9, and m,2 = 10 expensive locations such that a single access 
to an expensive location has cost C2 = 27. The number of times pi that the i-th key 
Xi is accessed, for 1 < i < 7, and the number of times qj that the search argument lies 
between Xj and Xj+i, for < j < 7, are: 

Pi = (7,3,9,3,3,6,3) 
g, = (4,9,4,5,5,7,5,9) 

As a result of an exhaustive search, we see that the optimum BST with X4 at the 
root, with total cost 3, 969, has 3 cheap locations assigned to the left subtree, and 1 cheap 
location assigned to the right subtree. However, the optimum BST with X5 at the root, 
with total cost 4, 068, has only 2 cheap locations assigned to the left subtree, and 2 cheap 
locations assigned to the right subtree. This example disproves conjecture | 



Conjecture 24 (Disproved). [Conjecture of unimodality] The cost of an optimum 
BST with a fixed root Xk is a unimodal function of the number of cheap locations assigned 
to the left subtree. 
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Conjecture [2l] would imply that we could substantially improve the running time of 
ALGORITHM PARTS of section I3.3.4[ The h - 1 innermost loops of ALGORITHM PARTS 
each perform a linear search for the optimum way to partition the available memory 
locations from each level between the left and right subtrees. If the conjecture were true, 
we could perform a discrete unimodal search instead and reduce the overall running time 
to 0{{\ogn)'^'^ -n^). 

Counter-example: A counter-example to conjecture [21] is the binary search tree over 
n = 15 keys, where the frequencies of access are: 

p,, = (2,2,9,2,1,4,10,9,9,7,5,6,9,8,10) 
qj = (1, 8, 8, 1, 3, 4, 6, 6, 6, 3, 3, 10, 8, 3, 4, 3) 

The instance of the HMM model has mi = 7 cheap memory locations of cost ci = 7 
and r?7,2 = 24 expensive locations of cost C2 = 16. Through an exhaustive search, we 
determined that the cost of an optimum binary search tree with xs at the root exhibits 
the behavior shown in the graph of figure 14.41 as the number nl of cheap locations 
assigned to the left subtree varies from through 6. (As the root, xs is always assigned 
to a cheap location.) The graph of figure H^ plots the costs of the optimum left and right 
subtrees of the root and their sum, as the number of cheap locations assigned to the left 
subtree increases, or equivalently, as the number of cheap locations assigned to the right 
subtree decreases. (Note that the total cost of the BST is only a constant more than the 
sum of the costs of the left and right subtrees since the root is fixed.) We see from the 
graph that the cost of an optimum BST with n{ = A is greater than that for n[ = 3 
and n[ = 5; thus, the cost is not a unimodal function oi n[ . 

4.2.6 Dependence on the parameter h 

Downey and Fellows |DF99] define a class of parameterized problems, called fixed- 
parameter tractable (FPT). 



Definition 25 (Downey, Fellows [DF99] ). A parameterized problem L C S* x S* 



is fixed-parameter tractable if there is an algorithm^ that correctly decides for input 
{x, y) G S* X S*, whether {x, y) E L in time f{k)n°', where n is the size of the main part 
of the input x, \x\ = n, k is the integer parameter which is the length ofy, \y\ = k, a is 
a constant independent of k, and f is an arbitrary function. 
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Figure 4.4 The cost of an optimum BST is not a unimodal function. 

The best algorithm we have for the general problem, i.e., for arbitrary /i, is algo- 
rithm Parts of section [3.3.4[ which runs in time 0[p!^^'^\ Consider the case where 
all h levels in the memory hierarchy have roughly the same number of locations, i.e., 
rnx = 1712 = . . . = 'mh_i = \n/h\ and rrih = In/h]. If the number of levels h is a 
parameter to the problem, it remains open whether this problem is (strongly uniformly) 
fixed-parameter tractable — is there an algorithm to construct an optimum BST that runs 
in time 0{f{h)n°') where a is a constant independent of both h and n? For instance, is 
there an algorithm with running time 0(2'*n°)? Recall that we have a top-down algo- 
rithm (algorithm Split of section [3.3.61) that runs in time 0(2") for the case h = n. 
A positive answer to this question would imply that it is feasible to construct optimum 
BSTs over a large set of keys for a larger range of values of h, in particular, even when 
h = O(logn). 



65 



References 



[AACS87] A. Aggarwal, B. Alpern, A. K. Chandra, and M. Snir. A model for hierar- 
chical memory. In Proceedings of the 19th ACM Symposium on the Theory of 
Computing, pages 305-314, 1987. 

[ABCP98] B. Awerbuch, B. Berger, L. Cowen, and D. Peleg. Near-linear time construc- 
tion of sparse neighborhood covers. SIAM Journal on Computing, 28(1):263- 
277, 1998. 

[AC88] A. Aggarwal and A. K. Chandra. Virtual memory algorithms. In Proceedings 
of the 20th ACM Symposium on the Theory of Computing, pages 173-185, 
1988. Preliminary Version. 

[ACFS94] B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy 
model of computation. Algorithmica, 12:72-109, 1994. 

[ACS87] A. Aggarwal, A. K. Chandra, and M. Snir. Hierarchical memory with block 
transfer. In Proceedings of the 28th IEEE Symposium on Foundations of 
Computer Science, pages 204-216, 1987. 

[ACS90] A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of 
PRAMs. Theoretical Computer Science, 71:3-28, 1990. 

[AV88] A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and 
related problems. Communications of the ACM, 31(9):1116-1127, September 
1988. 

[AVL62] G. M. Adel'son-Vel'skii and E. M. Landis. An algorithm for the organization 
of information. Soviet Mathematics Doklady, 3:1259-1263, 1962. 

[BC94] D. P. Bovet and P. Crescenzi. Introduction to the Theory of Complexity. 
Prentice Hall, 1994. 

[CGG+95] Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and 
J. S. Vitter. External-memory graph algorithms. In Proceedings of the Sixth 
Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, CA, 
1995), pages 139-149, 1995. 

[CJLM99] S. Chatterjee, V. V. Jain, A. R. Lebeck, and S. Mundhra. Nonlinear array 
layouts for hierarchical memory systems. In Proceedings of the ACM Inter- 
national Conference on Supercomputing, Rhodes, Greece, June 1999. 



66 



[CKP+96] D. E. Culler, R. M. Karp, D. Patterson, A. Saliay, E. E. Santos, K. E. 
Schauser, R. Subramonian, and T. von Eicken. LogP: A practical model 
of parallel computation. Communications of the ACM, 39(ll):78-85, 1996. 

[CLR90] T. H. Gormen, C E. Leiserson, and R. L. Rivest. Introduction to Algorithms. 
MIT Press, 1990. 

[CS] S. Chatterjee and S. Sen. Cache-efficient matrix transposition. [Online] 

ftp : //ftp ■ cs . unc . edu/pub/users/sc/papers/hpcaOO . pdf ^ [September 17, 
2000]: ' 

[CT91] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 
1991. 

[DF99] R. G. Downey and M. R. Fellows. Parameterized Complexity. Monographs in 
Computer Science. Springer, 1999. 

[dPdS96] R. de Prisco and A. de Santis. New lower bounds on the cost of binary search 
trees. Theoretical Computer Science, 156(l-2):315-325, 1996. 

[GI99] J. Gil and A. Itai. How to pack trees. Journal of Algorithms, 32(2):108-132, 

1999. 

[GJ79] M. R. Carey and D. S. Johnson. Computers and Intractability: A Guide to 
the Theory of NP- Completeness. W. H. Freeman and Co., 1979. 

[GS73] D. D. Grossman and H. F. Silverman. Placement of records on a secondary 
storage device to minimize access time. Journal of the ACM, 20(3):429-438, 
July 1973. 

[HK81] J. Hong and H. Kung. I/0-complexity: The red blue pebble game. In Pro- 
ceedings of ACM Symposium on Theory of Computing, 1981. 

[HLH92] E. Hagersten, A. Landin, and S. Haridi. DDM — a cache-only memory archi- 
tecture. IEEE Computer, pages 44-54, September 1992. 

[HP96] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative 
Approach. Morgan Kaufmann, 2nd edition, 1996. 

[HR76] L. Hyafil and R. L. Rivest. Constructing optimal binary decision trees is 
NP-complete. Information Processing Letters, 5(1):15-17, May 1976. 

[HT71] T. C. Hu and A. C. Tucker. Optimal computer search trees and variable-length 
alphabetical codes. SIAM Journal on Applied Mathematics, 21(4):514-532, 
December 1971. 

[Huf52] D. A. Huffman. A method for the construction of minimum redundancy codes. 
Proceedings of the Institute of Radio Engineers, 40(9):1098-1101, September 
1952. 

[JW94] B. H. H. Juurlink and H. A. G. Wijshoff. The parallel hierarchical memory 
model. In Algorithm Theory — SWAT, number 824 in Lecture Notes in 
Computer Science, pages 240-251. Springer- Verlag, 1994. 



67 



[Knu71] 
[Knu73] 

[LL96] 

[LL99] 
[Mak95] 
[Meh75] 
[Meh84] 

[Nag97] 
[NGV96] 

[Pap95] 

[PS85] 

[PU87] 
[Reg96] 

[Sav98] 
[Smi82] 
[ST85] 
[Val89] 

[Val90] 



D. E. Knuth. Optimum binary search trees. Acta Informatica, 1:14-25, 1971. 

D. E. Knuth. The Art of Computer Programming, vol. 3: Sorting and Search- 
ing. Addison- Wesley, 1973. 

A. LaMarca and R. E. Ladner. The influence of caches on the perfor- 
mance of heaps. Journal of Experimental Algorithmics, 1(4), 1996. [Online] 



http://www.jea.acm.org/1996/LcLMarcaInfluence/ [September 17, 2000]. 



A. LaMarca and R. E. Ladner. The influence of caches on the performance 
of sorting. Journal of Algorithms, 31(1):66-104, 1999. 

L. Mak. The Power of Parallel Time. PhD thesis. University of Illinois at 
Urbana-Champaign, May 1995. 

K. Mehlhorn. Nearly optimal binary search trees. Acta Informatica, 5:287- 
295, 1975. 

K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching. 
EATCS Monographs on Theoretical Computer Science. Springer- Verlag, 1984. 

S. V. Nagaraj. Optimal binary search trees. Theoretical Computer Science, 
188:1-44, 1997. 

M. H. Nodine, M. T. Goodrich, and J. S. Vitter. Blocking for external graph 
searching. Algorithmica, 16(2):181-214, August 1996. 

C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1995. 

F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduc- 
tion. Texts and Monographs in Computer Science. Springer- Verlag, 1985. 

C. H. Papadimitriou and J. D. Ullman. A communication-time tradeoff. SIAM 
Journal on Computing, 16(4):639-646, August 1987. 

K. W. Regan. Linear time and memory-efficient computation. SIAM Journal 
on Computing, 25(1):133-168, February 1996. 

J. E. Savage. Models of Computation: Exploring the Power of Computing. 
Addison-Wesley, 1998. 

A. J. Smith. Cache memories. ACM Computing Surveys, 14(3):473-530, 
September 1982. 

D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. Journal 
of the Association for Computing Machinery, 32(3):652-686, July 1985. 

L. G. Valiant. Bulk synchronous parallel computers. In M. Reeve and S. E. 
Zenith, editors. Parallel Processing and Artificial Intelligence. Wiley, 1989. 
ISBN 0-471-92497-0. 

L. G. Valiant. A bridging model for parallel computation. Communications 
of the ACM, 33(8):103-111, August 1990. 



68 



[Vit] J. S. Vitter. External memory algorithms and data structures: Dealing with 

massive data. To appear in ACM Computing Surveys. 

[Wil87] A. W. Wilson Jr. Hierarchical cache/bus architecture for shared memory 
multiprocessors. In Proceedings of the Fourteenth International Symposium 
on Computer Architecture, pages 244-252, June 1987. 

[Yao82] F. F. Yao. Speed-up in dynamic programming. SIAM Journal on Algebraic 
Discrete Methods, 3(4):532-540, 1982. 



69 



